diff --git a/.nojekyll b/.nojekyll index c4092d48..32bfb3f8 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -982ab178 \ No newline at end of file +cfd85cdb \ No newline at end of file diff --git a/Machine-Learning-Systems.pdf b/Machine-Learning-Systems.pdf index 275b6c8d..469d385c 100644 Binary files a/Machine-Learning-Systems.pdf and b/Machine-Learning-Systems.pdf differ diff --git a/contents/about.html b/contents/about.html index da42b245..801d1954 100644 --- a/contents/about.html +++ b/contents/about.html @@ -44,15 +44,15 @@ \x20\x20\x20\x20\x20\x20\x20Previous\x20Conversations\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20New\x20Chat\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20Delete\x20all\x20Chats\x20\x20\x20\x20\x20','dir','ⅅ','n-acro-noun','|[.]|$)|','chem','⫈','#Person','token_avaya','unicode|punctuation|whitespace|acronyms','%TLDS%','ƒ','|possessives|adverbs|nouns|verbs','our-full-support','polish-male','sixty','textContent','⤶','quick-singular','♂','#Honorific\x20#FirstName\x20[#Singular]','scrollY','parseLinkDestination','_multiCache','⩚','magenta','⅘','addWords','[(stop|start|finish|help)]\x20#Gerund','⟼','2:un¦oken:eak¦ought:eek¦oven:eave¦1ne:o¦1own:ly¦1den:de¦1in:ay¦2t:am¦2n:ee¦3en:all¦4n:rive,sake,take¦5n:rgive','Ų','expandLexicon','‘‛❛❜’','╡','(#Adjective\x20&&\x20@hasComma)\x20#Adjective','∃','2-tagYear','⇵','classic','|---------|-----|\x0a','#Copula\x20[there]\x20.','ˆ','arent','в','bldg','ĵ','^even\x20(if|though)','fifty','^#Verb$','findIndex','special-comittee','est','ID\x20is\x20required\x20for\x20updating\x20content.','⋧','getElementById','f-as-gerund','lines','image','⊧','Honorific-TitleCase','^[(shut|close|open|start|stop|end|keep)]\x20#Determiner\x20#Noun','clientSide','
\x0a',' ','i\x20am','𝓀','^(want|wants|wanted)\x20to\x20#Infinitive$','stopPropagation','^(does|do)\x20#Infinitive$','[about\x20to]\x20#Adverb?\x20#Verb','(?:','for','3-[foo-place]','𝔭','ffl','⋩','maleable','town','senior?\x20vice?\x20president\x20of\x20#Noun+','settle','to-swears','ctrlKey','NumberRange','directorate','usafa','^(if|unless|while|but|for|per|at|by|that|which|who|from)','ö','innerHeight','tly','peri','second-noun','that-were-growing','#popover-container','caption','name','been','#ProperNoun\x20#Organization','similarTexts','⁄','getRules','ªÀÁÂÃÄÅàáâãäåĀāĂ㥹ǍǎǞǟǠǡǺǻȀȁȂȃȦȧȺΆΑΔΛάαλАаѦѧӐӑӒӓƛæ','⫏','demonym-currency','Π','ever\x20since','936020SRYfzH','#Modal\x20(#Adverb|not)+?\x20[have]\x20(#Adverb|not)+?\x20[had]\x20(#Adverb|not)+?\x20#Verb','^[#Infinitive]\x20(is|was)','do-not-be-confused',']','club','╓','Unit','message-container','shoreline','⥣','⦍','ŭ','emitNumericEntity','match','\x5c-(?!--(?:[^-]|$))(?:-*)|','org-abbrv','will\x20want','newline','1-freeze-lexicon','^has\x20been\x20#Passive','sus','importScripts','slowly-adj','╠','[had]\x20#Noun+\x20#PastTense','​','topics','^will\x20#Infinitive\x20#Gerund$','it\x27s','𝔴','⊴',';','very-big-dream','≽','matches','5sajRiH','three','#Verb','⦱','^#Infinitive$','sep','[home]\x20to','chateaux','⊊','(#Person|#Pronoun)','Ф','verb-like','̑','(west|north|south|east|western|northern|southern|eastern)+\x20#Place','onsuccess','^#Fraction$','some','ú','toString','#HashTag','‌','⊑','guild','{','power','\x20-inferred\x20by\x20#','(king|queen|prince|saint|lady)\x20of\x20#Noun','backgroundColor','#Copula\x20#Gerund\x20[#PresentTense]\x20!by?','some\x20sort\x20of','‏','username','𝕎','rookies','red','ј','whims','why','#success-notice','research','#Singular','true¦eBf7nin5s3t0zeroE;enDhir1we0;lfCn7;d,t3;e0ixt8;cond,vent7;et0th;e6ie7;i2o0;r0urt3;tie4;ft1rst;ight0lev1;e0h,ie1;en0;th','which\x20[%Noun|Verb%]\x20#Noun','holiday-day','company-wide','⥱','(for|with|of)\x20#Noun\x20(and|or|not)\x20[%Noun|Verb%]','strong','wholly','catch',')\x20#Cardinal\x20#Cardinal','(want|wants)','θ','regex','Ė','UPPER_A','𝒵','Ð','↽','Ú','⤷','propernoun-place','remove','#Organization+','∠⃒','[(subject|subjects|subjected)]\x20to','and-5-cents','march-and-feb','Time','2-tagYear-far','xas','↳','Č','%Person|Verb%\x20#Person','true¦fri2mon2s1t0wednesd3;hurs1ues1;aturd1und1;!d0;ay0;!s','█','exp','⤑','any-verbs-for','#PastTense\x20and\x20[%Adj|Past%]','^[#Infinitive]\x20(your|my|the|a|an|any|each|every|some|more|with|on)','title-van-title','(#PresentTense|#PastTense)\x20[will\x20be]$','¥','still-advb','⪌','≅','the-premier-university','fitful','killUnicode','²','sextillion','mr-2','closeButtonHelpModal','⊮','Ђ','Ж','have\x20not','⨯','⏝','ProperNoun','tpl_host_fuzzy','#Copula\x20[far\x20too]\x20#Adjective','html','Ρ','eus','⌶','model','so-adv','SportsTeam','≍','≡','battlefield','Show\x20chain\x20of\x20thought','td_open','Ž','pounds','Token','⋾','⋲','consumed','⥑','firstTerms','april','hardbreak',']\x20!#Noun?','∽','2-prefix','inline','metropolis','https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=','[much]\x20#Adjective','tpl_host_no_ip_fuzzy','cis','#load_chats','⇝','him','⥙','plc','getItem','llb','radii','_close','enabled',')\x5c.)+(?:%TLDS%)))','whatd','𝕌','®','the\x20[#Adjective]\x20#Noun','[(#Modal|had|has)]\x20(#Adverb|not)+?\x20[been]\x20(#Adverb|not)+?\x20#Verb','>⃒','message','ĹĺĻļĽľĿŀŁłƚƪǀǏǐȴȽΙӀӏ','hotel','ĉ','validate','ℚ','cus','micro','tbody','#choices','should-smoke','stateNumericDecimal','write','holdings','memorial','⪻','#scroll-target','light','extend','hooks','isExclamation','#more-papers-search-button-','a[cdefgilmnoqrstuwxz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvwxyz]|d[ejkmoz]|e[cegrstu]|f[ijkmor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdeghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eosuw]|s[abcdeghijklmnortuvxyz]|t[cdfghjklmnortvwz]|u[agksyz]|v[aceginu]|w[fs]|y[et]|z[amw]','biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф','(become|fall|grow)\x20#Adverb?\x20[#PastTense]','´','shore','⨑','which-boost-it','as\x20if','united-sportsteam','(#FirstName\x20&&\x20!#Possessive)\x20[#Singular]\x20#Verb','no-doubt','#Value\x20#Noun\x20(st|street|rd|road|crescent|cr|way|tr|terrace|avenue|ave)','sept','TextCardinal','⩦','\x22:\x20','⧜','toOrdinal','(is|are|am|was)\x20going\x20to\x20(#Infinitive|#PresentTense)','gazette','string','𝕓','indices','𝓈','','general','≸','keep-playing','had-been','Region','entertainment','(a|an)','smoked-poutine','configure','transform','mostly','⥸','bacilli','𝒻','⪥','noun-of-noun','vitro','super-strong','auth','⟉','#Determiner\x20[#Singular]\x20said','◂','fromSuperlative','Ċ','quartet','(still|even|just)','month-the-value','chief\x20.\x20officer',')|(','supra','(become|became|becoming|becomes)\x20[#Verb]','agency','dc:title','have\x20[#PresentTense]','⨆','collapsible-content','≊','→','#myModal','(the|this|a|an)\x20[#Infinitive]\x20#Adverb?\x20#Verb','>\x20predict\x20the\x20completion\x20of\x20the\x20query\x20inside\x20the\x20<>.\x20Output\x20only\x20the\x20completed\x20query.','idle','(urban|cardiac|cardiovascular|respiratory|medical|clinical|visual|graphic|creative|dental|exotic|fine|certified|registered|technical|virtual|professional|amateur|junior|senior|special|pharmaceutical|theoretical)+\x20#Noun?\x20#Actor','?x=','F.B.I','dist','gal','a\x20[#Adjective]\x20#Preposition','\x0a\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20Error:\x20\x0a\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20Check\x20your\x20network\x20connection\x20and\x20try\x20again.\x0a\x20\x20\x20\x20\x20\x20\x20\x20','the\x20#Singular\x20[#Infinitive]\x20#Noun','\x20out\x20of\x20','𝓆','hundred\x20million','found','isCollapsed','ideolog','[sun]\x20#Date','fragments_join','⪬︀','satir','normalize','∷','scrollIntoView','true¦0:3O;1:3M;2:3N;3:3D;4:32;5:2V;6:3E;7:3K;8:36;9:3J;A:3B;a3Pb37c2Jd27e23f1Vg1Sh1Mi1Ij1Gk1Dl18m13n11o0Wp0Pques0Sr0EsTtNunderMvKwFyDzB;eroi0oB;ni0o3P;aw2eB;ar2l3;aEed4hispe5i5oCrB;ap8est3i1;n0ErB;ki0r31;i1r2s9tc9;isualizi0oB;lunt1Vti0;stan4ta6;aFeDhin6iCraBy8;c6di0i2vel1M;mi0p8;aBs1;c9si0;l6n2s1;aUcReQhOiMkatKl2Wmo6nowJpeItFuCwB;ea5im37;b35f0FrB;fi0vB;e2Mi2J;aAoryt1KrCuB;d2KfS;etc9ugg3;l3n4;bCi0;ebBi0;oar4;gnBnAt1;a3i0;ip8oB;p8rte2u1;a1r27t1;hCo5reBulp1;a2Qe2;edu3oo3;i3yi0;aKeEi4oCuB;li0n2;oBwi0;fi0;aFcEhear7laxi0nDpor1sB;pon4tructB;r2Iu5;de5;or4yc3;di0so2;p8ti0;aFeacek20laEoCrBublis9;a1Teten4in1oces7;iso2siB;tio2;n2yi0;ckaAin1rB;ki0t1O;fEpeDrganiCvB;erco24ula1;si0zi0;ni0ra1;fe5;avi0QeBur7;gotia1twor6;aDeCi2oB;de3nito5;a2dita1e1ssaA;int0XnBrke1;ifUufactu5;aEeaDiBodAyi0;cen7f1mi1stB;e2i0;r2si0;n4ug9;iCnB;ea4it1;c6l3;ogAuB;dAgg3stif12;ci0llust0VmDnBro2;nova1sp0NterBven1;ac1vie02;agi2plo4;aDea1iCoBun1;l4w3;ki0ri0;nd3rB;roWvB;es1;aCene0Lli4rBui4;ee1ie0N;rde2the5;aHeGiDlCorBros1un4;e0Pmat1;ir1oo4;gh1lCnBs9;anZdi0;i0li0;e3nX;r0Zscina1;a1du01nCxB;erci7plo5;chan1di0ginB;ee5;aLeHiGoub1rCum8wB;el3;aDeCiB;bb3n6vi0;a0Qs7;wi0;rTscoDvi0;ba1coZlBvelo8;eCiB;ve5;ga1;nGti0;aVelebUhSlPoDrBur3yc3;aBos7yi0;f1w3;aLdi0lJmFnBo6pi0ve5;dDsCvinB;ci0;trBul1;uc1;muniDpB;lBo7;ai2;ca1;lBo5;ec1;c9ti0;ap8eaCimToBubT;ni0t9;ni0ri0;aBee5;n1t1;ra1;m8rCs1te5;ri0;vi0;aPeNitMlLoGrDuB;dge1il4llBr8;yi0;an4eat9oadB;cas1;di0;a1mEokB;i0kB;ee8;pi0;bi0;es7oa1;c9i0;gin2lonAt1;gi0;bysit1c6ki0tt3;li0;ki0;bando2cGdverti7gi0pproac9rgDssuCtB;trac1;mi0;ui0;hi0;si0;coun1ti0;ti0;ni0;ng','−','whose\x20[#PresentTense]\x20#Copula','∋','reference-btn-container','gases','ь','true¦earlier','titlecase-acronym-titlecase','В','↰','testSchemaAt','(have|had)\x20[#Adjective]\x20#Preposition\x20.','\x5c$&','Ŀ','⋇','alt','toASCII','1pm-sun','download.txt','expand','𝔞','(was|were|is|are|am)\x20being\x20(#PastTense|#Participle)','the-poor','dehyphenate','det-inf','size','familial','how-to-noun','#Possessive\x20#Ordinal\x20[#PastTense]','univ','sportsfield','3-4pm','used','looksPlural','that-adj','lastIndexOf','dun','#up-chevron','ů','(one|1|a|an)','⥅','formulae','ê','𝔬','whom','use','dataset','even-walk','⫐','national','é','ridge','true¦0:2K;1:2Q;2:2H;3:2B;a2Ob2Bc1Xd1Ses1Rf1Pg1Kh1Gi1Bj17k12l0Zm0On06o04pYqVrSsJtEuBverAw6y4zacatec2S;akut0o0Cu4;cat2k06;a5est\x204isconsin,yomi1K;bengal,virgin0;rwick3shington4;!\x20dc;acruz,mont;dmurt0t4;ah,tar4;\x202La0Y;a6e5laxca1Rripu1Xu4;scaEva;langa1nnessee,x2F;bas0Wm4smOtar25;aulip2Dil\x20nadu;a9i7o5taf12u4ylh1F;ffZrr05s1A;me1Cno1Quth\x204;cWdV;ber0c4kkim,naloa;hu2ily;n5skatchew2xo4;ny;\x20luis\x20potosi,ta\x20catari1;a4hodeA;j4ngp08;asth2shahi;ingh25u4;e4intana\x20roo;bec,en6retaro;ara8e6rince\x20edward4unjab;\x20i4;sl0C;i,nnsylv4rnambu0C;an0;!na;axa0Ydisha,h4klaho20ntar4reg7ss0Cx0H;io;aKeEo6u4;evo\x20le4nav0W;on;r4tt17va\x20scot0;f9mandy,th4;\x204ampton3;c6d5yo4;rk3;ako1N;aroli1;olk;bras1Mva0Cw4;\x205foundland4;!\x20and\x20labrador;brunswick,hamp3jers5mexiTyork4;!\x20state;ey;galPyarit;aAeghala0Mi6o4;nta1r4;dov0elos;ch6dlanDn5ss4zor11;issippi,ouri;as\x20geraPneso18;ig2oac2;dhy12harasht0Gine,ni5r4ssachusetts;anhao,i\x20el,ylG;p4toba;ur;anca3e4incoln3ouisI;e4iR;ds;a6e5h4omi;aka06ul1;ntucky,ra01;bardino,lmyk0ns0Qr4;achay,el0nata0X;alis6har4iangxi;kh4;and;co;daho,llino7n4owa;d5gush4;et0;ia1;is;a6ert5i4un2;dalFm0D;ford3;mp3rya1waii;ansu,eorg0lou7oa,u4;an4izhou,jarat;ajuato,gdo4;ng;cester3;lori4uji2;da;sex;ageUe7o5uran4;go;rs4;et;lawaMrby3;aFeaEh9o4rim08umbr0;ahui7l6nnectic5rsi4ventry;ca;ut;i03orado;la;e5hattisgarh,i4uvash0;apRhuahua;chn5rke4;ss0;ya;ra;lGm4;bridge3peche;a9ihar,r8u4;ck4ryat0;ingham3;shi4;re;emen,itish\x20columb0;h0ja\x20cal8lk7s4v7;hkorto4que;st2;an;ar0;iforn0;ia;dygHguascalientes,lBndhr9r5ss4;am;izo1kans5un4;achal\x207;as;na;a\x204;pradesh;a6ber5t4;ai;ta;ba5s4;ka;ma;ea','frameset','fifth','ine','Adverb','isPositive','lookAfter','You\x20are\x20a\x20creative\x20master\x20mindfulness\x20coach.\x20Share\x20a\x20simple\x201\x20minute\x20non-generic\x20mindfulness\x20practice\x20to\x20help\x20someone\x20rejuvenate.','Date','⋉','⪦','#Stop','waited-until-release','(dc|atlanta|minnesota|manchester|newcastle|sheffield)\x20united','marketing','⫂','lays','meta','map','fla','partners','1031739BNsvcM','presumedMale','(#Copula|#Noun|sure|fun|up)','just-go','later-say','⋍','alert','em_open','Noun|Gerund','management','turn-down','⁣','now\x20that','[(march|may)]\x20#Adverb','scrollHeight','⨭','instant-access','⥬','highland','src_ZCc','@hasComma','a\x20bit\x20much','c\x27mon','sandhill','terrace','#Person\x20#RomanNumeral','𝒸','opinion','ō','(#Pronoun|#Place|#Value|#Person|#Uncountable|#Month|#WeekDay|#Holiday|#Possessive)','𝔶','i\x20(#Adverb|do)?\x20not?\x20[mean]','Ķ','rob-smith','𝔯','pract','throughout','#PresentTense\x20#Plural','unTag','ariz','⋳','@hasHyphen\x20.','#Organization+\x20#Possessive','(want|wants|wanted)\x20to','vol','⋫','guess\x20who','local','emptied','startEntity','proper-person','even\x20better','un-vite','ends','(had|has|have)\x20[been\x20/en$/]','validateLink','not-adverb','1-punct-hyphen\x27\x27','abuses','𝕒','⦤','@hasExclamation','toNice','provincial','slashes','vacuum-sealed','copula-noun-lastname','skipToken','help\x20[(stop|end|make|start)]','[#PastTense]\x20#Auxiliary+\x20#PastTense','grow','bus','#Possessive\x20[%Adj|Noun%]\x20#Noun','tagName','right-of','nebr','%Person|Noun%\x20#Acronym?\x20#ProperNoun','intra','и','src_domain_root','[#PastTense]\x20#Plural\x20are','#Modal\x20[%Person|Verb%]','nge','ŗ',';/?:@&=+$,-_.!~*\x27()#','ı','will\x20have','𝒞','true¦0:38;1:2L;2:3B;a2Xb2Ec22d1Ye1Sf1Mg1Ch1Ai14j12k0Zl0Um0Gn05om2pZqat1KrXsKtCu7v5wal4yemTz3;a25imbabwe;es,lis\x20and\x20futu2Y;a3enezue32ietnam;nuatu,tican\x20city;gTk6nited\x204ruXs3zbeE;\x202Ca,sr;arab\x20emirat0Kkingdom,states3;!\x20of\x20am2Y;!raiV;a8haCimor\x20les0Co7rinidad\x205u3;nis0rk3valu;ey,me2Zs\x20and\x20caic1V;and\x20t3t3;oba1L;go,kel10nga;iw2ji3nz2T;ki2V;aDcotl1eCi9lov8o6pa2Dri\x20lanka,u5w3yr0;az3edAitzerl1;il1;d2riname;lomon1Xmal0uth\x203;afr2KkMsud2;ak0en0;erra\x20leoFn3;gapo1Yt\x20maart3;en;negLrb0ychellZ;int\x203moa,n\x20marino,udi\x20arab0;hele26luc0mart21;epublic\x20of\x20ir0Eom2Euss0w3;an27;a4eIhilippinUitcairn1Mo3uerto\x20riN;l1rtugF;ki2Dl4nama,pua\x20new0Vra3;gu7;au,esti3;ne;aBe9i7or3;folk1Ith4w3;ay;\x20k3ern\x20mariana1D;or0O;caragua,ger3ue;!ia;p3ther1Aw\x20zeal1;al;mib0u3;ru;a7exi6icro0Bo3yanm06;ldova,n3roc5zambA;a4gol0t3;enegro,serrat;co;cAdagasc01l7r5urit4yot3;te;an0i16;shall0Xtin3;ique;a4div3i,ta;es;wi,ys0;ao,ed02;a6e5i3uxembourg;b3echtenste12thu1G;er0ya;ban0Isotho;os,tv0;azakh1Fe4iriba04o3uwait,yrgyz1F;rXsovo;eling0Knya;a3erG;ma16p2;c7nd6r4s3taly,vory\x20coast;le\x20of\x20m2rael;a3el1;n,q;ia,oJ;el1;aiTon3ungary;dur0Ng\x20kong;aBermany,ha0QibraltAre8u3;a6ern5inea3ya0P;!\x20biss3;au;sey;deloupe,m,tema0Q;e3na0N;ce,nl1;ar;bUmb0;a7i6r3;ance,ench\x203;guia0Epoly3;nes0;ji,nl1;lklandUroeU;ast\x20tim7cu6gypt,l\x20salv6ngl1quatorial4ritr5st3thiop0;on0;\x20guin3;ea;ad3;or;enmark,jibou5ominica4r\x20con3;go;!n\x20C;ti;aBentral\x20african\x20Ah8o5roat0u4yprRzech3;\x209ia;ba,racao;c4lo3morQngo\x20brazzaville,okGsta\x20r04te\x20de\x20ivoiL;mb0;osE;i3ristmasG;le,na;republic;m3naUpe\x20verde,ymanA;bod0ero3;on;aGeDhut2o9r5u3;lgar0r3;kina\x20faso,ma,undi;azil,itish\x203unei;virgin3;\x20is3;lands;liv0nai5snia\x20and\x20herzegoviHtswaHuvet3;\x20isl1;and;re;l3n8rmuG;ar3gium,ize;us;h4ngladesh,rbad3;os;am4ra3;in;as;fghaGlDmBn6r4ustr3zerbaij2;al0ia;genti3men0uba;na;dorra,g5t3;arct7igua\x20and\x20barbu3;da;o3uil3;la;er3;ica;b3ger0;an0;ia;ni3;st2;an','){0,61}','irl','have\x20been','⧝','got\x20#Adverb?\x20[#PastTense]\x20of','Ā','asst','chronological','ai-message','⨖','a-rep-to','true¦its,my,our0thy;!s','tens','heading_open','[used\x20to]\x20#PresentTense','(in|by|before|during|on|until|after|of|within|all)\x20[march]','(was|were|had|have)','Error','even-left','blockquote_open','firstName','theater','#Comparative','Ō','hasQuote','do-eat','faith-based','success','𝕋','render','https://harvard-edge.github.io/cs249r_book/','parseMatch','municipal','valley','^#Verb\x20.\x20#Noun+$','⫫','BACKGROUND\x20KNOWLEDGE','getTime','sliderValue','seems-filled','a-warhol','updateSlider','
(#Cardinal|a)+]\x20[#Fraction+]','redact','Ш','ġ','emojis','holy\x20(shit|fuck|hell)','blue','fromPresent','basic\x20understanding','⊀','ï','link','ĤĥĦħƕǶȞȟΉΗЂЊЋНнђћҢңҤҥҺһӉӊ','϶','true¦leaves','the\x20[#Verb]\x20#Preposition\x20.','terraces','time-range','⨒','𝒴','#FirstName\x20#Acronym\x20#Noun','stadium','ruler','max','true¦0:18;1:1E;2:1D;3:14;a1Db15c0Sd0Kfc\x20dallas,g0Ihouston\x200Hindiana0Gjacksonville\x20jagua0k0El0Am01new\x20UoRpKqueens\x20parkJreal\x20salt\x20lake,sBt6utah\x20jazz,vancouver\x20whitecaps,w4yW;ashington\x204h10;natio1Mredski2wizar0W;ampa\x20bay\x207e6o4;ronto\x204ttenham\x20hotspur;blue\x20ja0Mrapto0;nnessee\x20tita2xasD;buccanee0ra0K;a8eattle\x206porting\x20kansas0Wt4;\x20louis\x204oke0V;c1Drams;marine0s4;eah13ounH;cramento\x20Rn\x204;antonio\x20spu0diego\x204francisco\x20gJjose\x20earthquak1;char08paB;\x20ran07;a9h6ittsburgh\x205ortland\x20t4;imbe0rail\x20blaze0;pirat1steele0;il4oenix\x20su2;adelphia\x204li1;eagl1philNunE;dr1;akland\x204klahoma\x20city\x20thunder,rlando\x20magic;athle0Lrai4;de0;england\x208orleans\x207york\x204;g5je3knYme3red\x20bul0Xy4;anke1;ian3;pelica2sain3;patrio3revolut4;ion;anchEeAi4ontreal\x20impact;ami\x208lwaukee\x20b7nnesota\x204;t5vi4;kings;imberwolv1wi2;rewe0uc0J;dolphi2heat,marli2;mphis\x20grizz4ts;li1;a6eic5os\x20angeles\x204;clippe0dodFlaB;esterV;\x20galaxy,ke0;ansas\x20city\x204nF;chiefs,roya0D;\x20pace0polis\x20col3;astr05dynamo,rocke3texa2;olden\x20state\x20warrio0reen\x20bay\x20pac4;ke0;allas\x208e4i04od6;nver\x206troit\x204;lio2pisto2ti4;ge0;broncYnugge3;cowbo5maver4;icZ;ys;arEelLhAincinnati\x208leveland\x206ol4;orado\x20r4umbus\x20crew\x20sc;api7ocki1;brow2cavalie0guar4in4;dia2;bengaVre4;ds;arlotte\x20horAicago\x204;b5cubs,fire,wh4;iteB;ea0ulQ;diff4olina\x20panthe0;\x20city;altimore\x20Alackburn\x20rove0oston\x206rooklyn\x204uffalo\x20bilN;ne3;ts;cel5red4;\x20sox;tics;rs;oriol1rave2;rizona\x20Ast8tlanta\x204;brav1falco2h4;awA;ns;es;on\x20villa,r4;os;c6di4;amondbac4;ks;ardi4;na4;ls','#Plural\x20','100vw','irregulars','ⅈ','__compiled__','⅔','the-x-which','be-walking','labs','bank','𝔼','start_i','does-he','link_no_ip_fuzzy','dr-john-foo','src_host_port_strict','#Reflexive$','isStatement','whether','help-stop','completely-beaten','https://export.arxiv.org/api/query?search_query=','dirty','#delete-','eleven','src_host_terminator','𝕀','sort','return_all_fields','#down-chevron','toSuperlative','tpl_host_port_fuzzy_strict','portland-or','though','(that|which)\x20were\x20[%Adj|Gerund%]','neighborhood','↪','eight','#Copula\x20(#Adverb|not)+?\x20[going\x20to]\x20#Adverb+?\x20#PresentTense','VALUE_LENGTH','≹','detail',''','notA','splice','no-golden','he\x27s','mountains','∪','≿','true¦b8ch7dr6foster,gra5ja9lan4ma2ni9ollie,p1rob,s0wade;kip,pike,t5ue;at,eg,ier2;ck,r0;k,shal;ce;ce,nt;ew;ase,u1;iff,l1ob,u0;ck;aze,ossom','ψ','ð','#Value\x20seconds','⋈','textvalue-date','⩸','llm_model','container_','
       
- A resource for learning about machine learning systems. + A resource for learning about applied machine learning.
Prof. Vijay Janapa Reddi, Harvard University. diff --git a/search.json b/search.json index 6b468f6d..4d7e32df 100644 --- a/search.json +++ b/search.json @@ -59,7 +59,7 @@ "href": "contents/contributors.html", "title": "Contributors", "section": "", - "text": "We extend our sincere thanks to the diverse group of individuals who have generously contributed their expertise, insights, and time to enhance both the content and codebase of this project. Below you will find a list of all contributors. If you would like to contribute to this project, please see our GitHub page.\n\n\n\n\n\n\n\n\nVijay Janapa Reddi\n\n\nIkechukwu Uchendu\n\n\nnaeemkh\n\n\nDouwe den Blanken\n\n\nShanzeh Batool\n\n\n\n\nMatthew Stewart\n\n\nJared Ping\n\n\neliasab16\n\n\nishapira\n\n\nMaximilian Lam\n\n\n\n\nMarcelo Rovai\n\n\nJayson Lin\n\n\nSophia Cho\n\n\nJeffrey Ma\n\n\nKorneel Van den Berghe\n\n\n\n\nZishen\n\n\nKleinbard\n\n\nSrivatsan Krishnan\n\n\nAndrea Murillo\n\n\nAlex Rodriguez\n\n\n\n\narnaumarin\n\n\nAghyad Deeb\n\n\nAbdulrahman Mahmoud\n\n\nAghyad Deeb\n\n\nDivya\n\n\n\n\nELSuitorHarvard\n\n\noishib\n\n\nMichael Schnebly\n\n\nEmil Njor\n\n\nJared Ni\n\n\n\n\nMark Mazumder\n\n\nYU SHUN, HSIAO\n\n\nHenry Bae\n\n\nJae-Won Chung\n\n\nJennifer Zhou\n\n\n\n\nPong Trairatvorakul\n\n\nAditi Raju\n\n\nMarco Zennaro\n\n\nColby Banbury\n\n\nAndrew Bass\n\n\n\n\neurashin\n\n\nShvetank Prakash\n\n\nEric D\n\n\nAlex Oesterling\n\n\nBruno Scaglione\n\n\n\n\nAllen-Kuang\n\n\nGauri Jain\n\n\nSara Khosravi\n\n\nShreya Johri\n\n\nThe Random DIY\n\n\n\n\nVijay Edupuganti\n\n\nBatur Arslan\n\n\nCostin-Andrei Oncescu\n\n\nYu-Shun Hsiao\n\n\nSonia Murthy\n\n\n\n\nhappyappledog\n\n\nsonghan\n\n\nyanjingl\n\n\nJason Yik\n\n\nCurren Iyer\n\n\n\n\nAnnie Laurie Cook\n\n\nEmeka Ezike\n\n\nJessica Quaye\n\n\nabigailswallow", + "text": "We extend our sincere thanks to the diverse group of individuals who have generously contributed their expertise, insights, and time to enhance both the content and codebase of this project. Below you will find a list of all contributors. If you would like to contribute to this project, please see our GitHub page.\n\n\n\n\n\n\n\n\nVijay Janapa Reddi\n\n\nIkechukwu Uchendu\n\n\nnaeemkh\n\n\nDouwe den Blanken\n\n\nShanzeh Batool\n\n\n\n\nJared Ping\n\n\neliasab16\n\n\nishapira\n\n\nKleinbard\n\n\nMarcelo Rovai\n\n\n\n\nMaximilian Lam\n\n\nMatthew Stewart\n\n\nJayson Lin\n\n\nJeffrey Ma\n\n\nSophia Cho\n\n\n\n\nKorneel Van den Berghe\n\n\nZishen\n\n\nAndrea Murillo\n\n\nAlex Rodriguez\n\n\nSrivatsan Krishnan\n\n\n\n\nAbdulrahman Mahmoud\n\n\narnaumarin\n\n\nAghyad Deeb\n\n\nAghyad Deeb\n\n\nDivya\n\n\n\n\nMichael Schnebly\n\n\nJared Ni\n\n\noishib\n\n\nEmil Njor\n\n\nELSuitorHarvard\n\n\n\n\nMark Mazumder\n\n\nYU SHUN, HSIAO\n\n\nSara Khosravi\n\n\nHenry Bae\n\n\nJae-Won Chung\n\n\n\n\nPong Trairatvorakul\n\n\nMarco Zennaro\n\n\nJennifer Zhou\n\n\nAditi Raju\n\n\nAndrew Bass\n\n\n\n\neurashin\n\n\nColby Banbury\n\n\nShvetank Prakash\n\n\nAllen-Kuang\n\n\nSercan Aygün\n\n\n\n\nEric D\n\n\nBruno Scaglione\n\n\nAlex Oesterling\n\n\nGauri Jain\n\n\nThe Random DIY\n\n\n\n\nBatur Arslan\n\n\nCostin-Andrei Oncescu\n\n\nCurren Iyer\n\n\nSonia Murthy\n\n\nShreya Johri\n\n\n\n\nEmeka Ezike\n\n\nSophia Cho\n\n\nYu-Shun Hsiao\n\n\nhappyappledog\n\n\nabigailswallow\n\n\n\n\nsonghan\n\n\nAnnie Laurie Cook\n\n\nVijay Edupuganti\n\n\nyanjingl\n\n\nJason Yik\n\n\n\n\nJessica Quaye", "crumbs": [ "FRONT MATTER", "Contributors" @@ -198,7 +198,7 @@ "section": "", "text": "DALL·E 3 Prompt: A detailed, rectangular, flat 2D illustration depicting a roadmap of a book’s chapters on machine learning systems, set on a crisp, clean white background. The image features a winding road traveling through various symbolic landmarks. Each landmark represents a chapter topic: Introduction, ML Systems, Deep Learning, AI Workflow, Data Engineering, AI Frameworks, AI Training, Efficient AI, Model Optimizations, AI Acceleration, Benchmarking AI, On-Device Learning, Embedded AIOps, Security & Privacy, Responsible AI, Sustainable AI, AI for Good, Robust AI, Generative AI. The style is clean, modern, and flat, suitable for a technical book, with each landmark clearly labeled with its chapter title.\n\n\nIn the early 1990s, Mark Weiser, a pioneering computer scientist, introduced the world to a revolutionary concept that would forever change how we interact with technology. He envisioned a future where computing would be seamlessly integrated into our environments, becoming an invisible, integral part of daily life. This vision, which he termed “ubiquitous computing,” promised a world where technology would serve us without demanding our constant attention or interaction. Fast forward to today, and we find ourselves on the cusp of realizing Weiser’s vision, thanks to the advent and proliferation of machine learning systems.\nThe concept of ubiquitous computing (Weiser 1991), as envisioned by Weiser, involves more than just incorporating processors into ordinary objects. It revolves around infusing our surroundings with a kind of intelligence that can predict our requirements and take action on our behalf, enhancing our experiences without us having to issue explicit commands. The crucial element of this pervasive intelligence is developing and implementing machine learning systems at the edge of our networks.\n\nWeiser, Mark. 1991. “The Computer for the 21st Century.” Sci. Am. 265 (3): 94–104. https://doi.org/10.1038/scientificamerican0991-94.\nMachine learning, a subset of artificial intelligence, enables computers to learn from and make decisions based on data rather than following explicitly programmed instructions. When deployed at the edge, closer to data generation and action, these systems can process information in real time with minimal latency. This is critical for TinyML applications, where fast response is crucial, such as autonomous vehicles, real-time translation, and smart healthcare devices.\nThe migration of machine learning from centralized data centers to the edge of networks marks a significant evolution in computing architecture. The need for speed, privacy, and reduced bandwidth consumption drives this shift. By processing data locally, edge-based machine learning systems can make quick decisions without constantly communicating with a central server. This speeds up response times, conserves bandwidth, and enhances privacy by limiting the amount of data transmitted over the network.\nMoreover, the ability to deploy machine learning models in diverse environments has led to an explosion of innovative applications. From smart cities that optimize traffic flow in real-time to agricultural drones that monitor crop health and apply treatments precisely where needed, machine learning at the edge enables a level of contextual awareness and responsiveness that was previously unimaginable.\nDespite the promise of ubiquitous intelligence, deploying machine learning systems at the edge is challenging. These systems must operate within the constraints of limited processing power, memory, and energy availability, often in environments far from the controlled conditions of data centers. Additionally, ensuring the privacy and security of the data in these systems processes is paramount, particularly in applications that handle sensitive personal information.\nDeveloping machine learning models that are efficient enough to run at the edge while delivering accurate and reliable results requires innovative model design, training, and deployment approaches. Researchers and engineers are exploring techniques such as model compression, federated learning, and transfer learning to address these challenges.\nAs we stand on the threshold of Weiser’s vision of ubiquitous computing, machine learning systems are clear as the key to unlocking this future. By embedding intelligence in the fabric of our environment, these systems have the potential to make our interactions with technology more natural and intuitive than ever before. As we continue to push the boundaries of what’s possible with machine learning at the edge, we move closer to a world where technology quietly enhances our lives without ever getting in the way.\nIn this book, we will explore the technical foundations of machine learning systems, the challenges of deploying these systems at the edge, and the vast array of applications they enable. Join us as we embark on a journey into the future of ubiquitous intelligence, where the seamless integration of technology into our daily lives transforms the essence of how we live, work, and interact with the world around us.", "crumbs": [ - "MAIN", + "Fundamentals", "1  Introduction" ] }, @@ -209,7 +209,7 @@ "section": "", "text": "2.1 Machine Learning Systems\nML is rapidly evolving, with new paradigms reshaping how models are developed, trained, and deployed. One such paradigm is embedded machine learning, which is experiencing significant innovation driven by the proliferation of smart sensors, edge devices, and microcontrollers. Embedded machine learning refers to the integration of machine learning algorithms into the hardware of a device, enabling real-time data processing and analysis without relying on cloud connectivity. This chapter explores the landscape of embedded machine learning, covering the key approaches of Cloud ML, Edge ML, and TinyML (Figure 2.1).\nWe begin by outlining each embedded ML variant’s features or characteristics, benefits, challenges, and use cases. This provides context on where these technologies do well and where they face limitations. We then combine all three approaches into a comparative analysis, evaluating them across critical parameters like latency, privacy, computational demands, and more. This side-by-side perspective highlights the unique strengths and tradeoffs of selecting these strategies.\nNext, we trace the evolution timeline of embedded systems and machine learning, from the origins of wireless sensor networks to the integration of ML algorithms into microcontrollers. This historical lens enriches our understanding of the rapid pace of advancement in this domain. Finally, practical hands-on exercises offer an opportunity to experiment first-hand with embedded computer vision applications.\nBy the end of this multipronged exploration of embedded ML, you will possess the conceptual and practical knowledge to determine the appropriate ML implementation for your specific use case constraints. The chapter aims to equip you with the contextual clarity and technical skills to navigate this quickly shifting landscape, empowering impactful innovations.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -220,7 +220,7 @@ "section": "", "text": "Figure 2.1: Cloud vs. Edge vs. TinyML: The Spectrum of Distributed Intelligence. Credit: Massimo Banzi – Arduino.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -231,7 +231,7 @@ "section": "2.2 Cloud ML", "text": "2.2 Cloud ML\n\n2.2.1 Characteristics\nCloud ML is a specialized branch of the broader machine learning field within cloud computing environments. It offers a virtual platform for developing, training, and deploying machine learning models, providing flexibility and scalability.\nAt its foundation, Cloud ML utilizes a powerful blend of high-capacity servers, expansive storage solutions, and robust networking architectures, all located in data centers worldwide (Figure 2.2). This setup centralizes computational resources, simplifying the management and scaling of machine learning projects.\nThe cloud environment excels in data processing and model training and is designed to manage large data volumes and complex computations. Models crafted in Cloud ML can leverage vast amounts of data, processed and analyzed centrally, thereby enhancing the model’s learning and predictive performance.\n\n\n\n\n\n\nFigure 2.2: Cloud TPU data center at Google. Credit: Google.\n\n\n\n\n\n2.2.2 Benefits\nCloud ML is synonymous with immense computational power, adept at handling complex algorithms and large datasets. This is particularly advantageous for machine learning models that demand significant computational resources, effectively circumventing the constraints of local setups.\nA key advantage of Cloud ML is its dynamic scalability. As data volumes or computational needs grow, the infrastructure can adapt seamlessly, ensuring consistent performance.\nCloud ML platforms often offer a wide array of advanced tools and algorithms. Developers can utilize these resources to accelerate the building, training, and deploying sophisticated models, fostering innovation.\n\n\n2.2.3 Challenges\nDespite its capabilities, Cloud ML can face latency issues, especially in applications that require real-time responses. The time taken to send data to centralized servers and back can introduce delays, a significant drawback in time-sensitive scenarios.\nCentralizing data processing and storage can also create data privacy and security vulnerabilities. Data centers become attractive targets for cyber-attacks, requiring substantial investments in security measures to protect sensitive data.\nAdditionally, as data processing needs escalate, so do the costs of using cloud services. Organizations dealing with large data volumes may encounter rising costs, which could affect the long-term scalability and feasibility of their operations.\n\n\n2.2.4 Example Use Cases\nCloud ML is important in powering virtual assistants like Siri and Alexa. These systems harness the cloud’s computational prowess to analyze and process voice inputs, delivering intelligent and personalized responses to users.\nIt also provides the foundation for advanced recommendation systems in platforms like Netflix and Amazon. These systems sift through extensive datasets to identify patterns and preferences, offering personalized content or product suggestions to boost user engagement.\nIn the financial realm, Cloud ML has created robust fraud detection systems. These systems scrutinize vast amounts of transactional data to flag potential fraudulent activities, enabling timely interventions and reducing financial risks.\nIn summary, it’s virtually impossible to navigate the internet today without encountering some form of Cloud ML, directly or indirectly. From the personalized ads on your social media feed to the predictive text features in email services, Cloud ML is deeply integrated into our online experiences. It powers smart algorithms that recommend products on e-commerce sites, fine-tunes search engines to deliver accurate results, and even automates the tagging and categorization of photos on platforms like Facebook.\nFurthermore, Cloud ML bolsters user security through anomaly detection systems that monitor for unusual activities, potentially shielding users from cyber threats. It acts as the unseen powerhouse, continuously operating behind the scenes to refine, secure, and personalize our digital interactions, making the modern internet a more intuitive and user-friendly environment.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -242,7 +242,7 @@ "section": "2.3 Edge ML", "text": "2.3 Edge ML\n\n2.3.1 Characteristics\nDefinition of Edge ML\nEdge Machine Learning (Edge ML) runs machine learning algorithms directly on endpoint devices or closer to where the data is generated rather than relying on centralized cloud servers. This approach aims to bring computation closer to the data source, reducing the need to send large volumes of data over networks, often resulting in lower latency and improved data privacy.\nDecentralized Data Processing\nIn Edge ML, data processing happens in a decentralized fashion. Instead of sending data to remote servers, the data is processed locally on devices like smartphones, tablets, or IoT devices (Figure 2.3). This local processing allows devices to make quick decisions based on the data they collect without relying heavily on a central server’s resources. This decentralization is particularly important in real-time applications where even a slight delay can have significant consequences.\nLocal Data Storage and Computation\nLocal data storage and computation are key features of Edge ML. This setup ensures that data can be stored and analyzed directly on the devices, thereby maintaining the privacy of the data and reducing the need for constant internet connectivity. Moreover, this often leads to more efficient computation, as data doesn’t have to travel long distances, and computations are performed with a more nuanced understanding of the local context, which can sometimes result in more insightful analyses.\n\n\n\n\n\n\nFigure 2.3: Edge ML Examples. Credit: Edge Impulse.\n\n\n\n\n\n2.3.2 Benefits\nReduced Latency\nOne of Edge ML’s main advantages is the significant latency reduction compared to Cloud ML. This reduced latency can be a critical benefit in situations where milliseconds count, such as in autonomous vehicles, where quick decision-making can mean the difference between safety and an accident.\nEnhanced Data Privacy\nEdge ML also offers improved data privacy, as data is primarily stored and processed locally. This minimizes the risk of data breaches that are more common in centralized data storage solutions. Sensitive information can be kept more secure, as it’s not sent over networks that could be intercepted.\nLower Bandwidth Usage\nOperating closer to the data source means less data must be sent over networks, reducing bandwidth usage. This can result in cost savings and efficiency gains, especially in environments where bandwidth is limited or costly.\n\n\n2.3.3 Challenges\nLimited Computational Resources Compared to Cloud ML\nHowever, Edge ML has its challenges. One of the main concerns is the limited computational resources compared to cloud-based solutions. Endpoint devices may have a different processing power or storage capacity than cloud servers, limiting the complexity of the machine learning models that can be deployed.\nComplexity in Managing Edge Nodes\nManaging a network of edge nodes can introduce complexity, especially regarding coordination, updates, and maintenance. Ensuring all nodes operate seamlessly and are up-to-date with the latest algorithms and security protocols can be a logistical challenge.\nSecurity Concerns at the Edge Nodes\nWhile Edge ML offers enhanced data privacy, edge nodes can sometimes be more vulnerable to physical and cyber-attacks. Developing robust security protocols that protect data at each node without compromising the system’s efficiency remains a significant challenge in deploying Edge ML solutions.\n\n\n2.3.4 Example Use Cases\nEdge ML has many applications, from autonomous vehicles and smart homes to industrial IoT. These examples were chosen to highlight scenarios where real-time data processing, reduced latency, and enhanced privacy are not just beneficial but often critical to the operation and success of these technologies. They demonstrate the pivotal role that Edge ML can play in driving advancements in various sectors, fostering innovation, and paving the way for more intelligent, responsive, and adaptive systems.\nAutonomous Vehicles\nAutonomous vehicles stand as a prime example of Edge ML’s potential. These vehicles rely heavily on real-time data processing to navigate and make decisions. Localized machine learning models assist in quickly analyzing data from various sensors to make immediate driving decisions, ensuring safety and smooth operation.\nSmart Homes and Buildings\nEdge ML plays a crucial role in efficiently managing various systems in smart homes and buildings, from lighting and heating to security. By processing data locally, these systems can operate more responsively and harmoniously with the occupants’ habits and preferences, creating a more comfortable living environment.\nIndustrial IoT\nThe Industrial Internet of Things (IoT) leverages Edge ML to monitor and control complex industrial processes. Here, machine learning models can analyze data from numerous sensors in real-time, enabling predictive maintenance, optimizing operations, and enhancing safety measures. This revolution in industrial automation and efficiency.\nThe applicability of Edge ML is vast and not limited to these examples. Various other sectors, including healthcare, agriculture, and urban planning, are exploring and integrating Edge ML to develop innovative solutions responsive to real-world needs and challenges, heralding a new era of smart, interconnected systems.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -253,7 +253,7 @@ "section": "2.4 Tiny ML", "text": "2.4 Tiny ML\n\n2.4.1 Characteristics\nDefinition of TinyML\nTinyML sits at the crossroads of embedded systems and machine learning, representing a burgeoning field that brings smart algorithms directly to tiny microcontrollers and sensors. These microcontrollers operate under severe resource constraints, particularly regarding memory, storage, and computational power (see a TinyML kit example in Figure 2.4).\nOn-Device Machine Learning\nIn TinyML, the focus is on on-device machine learning. This means that machine learning models are deployed and trained on the device, eliminating the need for external servers or cloud infrastructures. This allows TinyML to enable intelligent decision-making right where the data is generated, making real-time insights and actions possible, even in settings where connectivity is limited or unavailable.\nLow Power and Resource-Constrained Environments\nTinyML excels in low-power and resource-constrained settings. These environments require highly optimized solutions that function within the available resources. TinyML meets this need through specialized algorithms and models designed to deliver decent performance while consuming minimal energy, thus ensuring extended operational periods, even in battery-powered devices.\n\n\n\n\n\n\nFigure 2.4: Examples of TinyML device kits. Credit: Widening Access to Applied Machine Learning with TinyML.\n\n\n\n\n\n\n\n\n\nExercise 2.1: TinyML with Arduino\n\n\n\n\n\nGet ready to bring machine learning to the smallest of devices! In the embedded machine learning world, TinyML is where resource constraints meet ingenuity. This Colab notebook will walk you through building a gesture recognition model designed on an Arduino board. You’ll learn how to train a small but effective neural network, optimize it for minimal memory usage, and deploy it to your microcontroller. If you’re excited about making everyday objects smarter, this is where it begins!\n\n\n\n\n\n\n2.4.2 Benefits\nExtremely Low Latency\nOne of the standout benefits of TinyML is its ability to offer ultra-low latency. Since computation occurs directly on the device, the time required to send data to external servers and receive a response is eliminated. This is crucial in applications requiring immediate decision-making, enabling quick responses to changing conditions.\nHigh Data Security\nTinyML inherently enhances data security. Because data processing and analysis happen on the device, the risk of data interception during transmission is virtually eliminated. This localized approach to data management ensures that sensitive information stays on the device, strengthening user data security.\nEnergy Efficiency\nTinyML operates within an energy-efficient framework, a necessity given its resource-constrained environments. By employing lean algorithms and optimized computational methods, TinyML ensures that devices can execute complex tasks without rapidly depleting battery life, making it a sustainable option for long-term deployments.\n\n\n2.4.3 Challenges\nLimited Computational Capabilities\nHowever, the shift to TinyML comes with its set of hurdles. The primary limitation is the devices’ constrained computational capabilities. The need to operate within such limits means that deployed models must be simplified, which could affect the accuracy and sophistication of the solutions.\nComplex Development Cycle\nTinyML also introduces a complicated development cycle. Crafting lightweight and effective models demands a deep understanding of machine learning principles and expertise in embedded systems. This complexity calls for a collaborative development approach, where multi-domain expertise is essential for success.\nModel Optimization and Compression\nA central challenge in TinyML is model optimization and compression. Creating machine learning models that can operate effectively within the limited memory and computational power of microcontrollers requires innovative approaches to model design. Developers often face the challenge of striking a delicate balance and optimizing models to maintain effectiveness while fitting within stringent resource constraints.\n\n\n2.4.4 Example Use Cases\nWearable Devices\nIn wearables, TinyML opens the door to smarter, more responsive gadgets. From fitness trackers offering real-time workout feedback to smart glasses processing visual data on the fly, TinyML transforms how we engage with wearable tech, delivering personalized experiences directly from the device.\nPredictive Maintenance\nIn industrial settings, TinyML plays a significant role in predictive maintenance. By deploying TinyML algorithms on sensors that monitor equipment health, companies can preemptively identify potential issues, reducing downtime and preventing costly breakdowns. On-site data analysis ensures quick responses, potentially stopping minor issues from becoming major problems.\nAnomaly Detection\nTinyML can be employed to create anomaly detection models that identify unusual data patterns. For instance, a smart factory could use TinyML to monitor industrial processes and spot anomalies, helping prevent accidents and improve product quality. Similarly, a security company could use TinyML to monitor network traffic for unusual patterns, aiding in detecting and preventing cyber-attacks. TinyML could monitor patient data for anomalies in healthcare, aiding early disease detection and better patient treatment.\nEnvironmental Monitoring\nIn environmental monitoring, TinyML enables real-time data analysis from various field-deployed sensors. These could range from city air quality monitoring to wildlife tracking in protected areas. Through TinyML, data can be processed locally, allowing for quick responses to changing conditions and providing a nuanced understanding of environmental patterns, crucial for informed decision-making.\nIn summary, TinyML serves as a trailblazer in the evolution of machine learning, fostering innovation across various fields by bringing intelligence directly to the edge. Its potential to transform our interaction with technology and the world is immense, promising a future where devices are connected, intelligent, and capable of making real-time decisions and responses.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -264,7 +264,7 @@ "section": "2.5 Comparison", "text": "2.5 Comparison\nUp to this point, we’ve explored each of the different ML variants individually. Now, let’s bring them all together for a comprehensive view. Table 2.1 offers a comparative analysis of Cloud ML, Edge ML, and TinyML based on various features and aspects. This comparison aims to provide a clear perspective on the unique advantages and distinguishing factors, aiding in making informed decisions based on the specific needs and constraints of a given application or project.\n\n\n\nTable 2.1: Comparison of feature aspects across Cloud ML, Edge ML, and TinyML.\n\n\n\n\n\n\n\n\n\n\n\nFeature/Aspect\nCloud ML\nEdge ML\nTinyML\n\n\n\n\nProcessing Location\nCentralized servers (Data Centers)\nLocal devices (closer to data sources)\nOn-device (microcontrollers, embedded systems)\n\n\nLatency\nHigh (Depends on internet connectivity)\nModerate (Reduced latency compared to Cloud ML)\nLow (Immediate processing without network delay)\n\n\nData Privacy\nModerate (Data transmitted over networks)\nHigh (Data remains on local networks)\nVery High (Data processed on-device, not transmitted)\n\n\nComputational Power\nHigh (Utilizes powerful data center infrastructure)\nModerate (Utilizes local device capabilities)\nLow (Limited to the power of the embedded system)\n\n\nEnergy Consumption\nHigh (Data centers consume significant energy)\nModerate (Less than data centers, more than TinyML)\nLow (Highly energy-efficient, designed for low power)\n\n\nScalability\nHigh (Easy to scale with additional server resources)\nModerate (Depends on local device capabilities)\nLow (Limited by the hardware resources of the device)\n\n\nCost\nHigh (Recurring costs for server usage, maintenance)\nVariable (Depends on the complexity of local setup)\nLow (Primarily upfront costs for hardware components)\n\n\nConnectivity Dependence\nHigh (Requires stable internet connectivity)\nLow (Can operate with intermittent connectivity)\nVery Low (Can operate without any network connectivity)\n\n\nReal-time Processing\nModerate (Can be affected by network latency)\nHigh (Capable of real-time processing locally)\nVery High (Immediate processing with minimal latency)\n\n\nApplication Examples\nBig Data Analysis, Virtual Assistants\nAutonomous Vehicles, Smart Homes\nWearables, Sensor Networks\n\n\nDevelopment Complexity\nModerate to High (Requires knowledge in cloud computing)\nModerate (Requires knowledge in local network setup)\nModerate to High (Requires expertise in embedded systems)", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -275,7 +275,7 @@ "section": "2.6 Conclusion", "text": "2.6 Conclusion\nIn this chapter, we’ve offered a panoramic view of the evolving landscape of machine learning, covering cloud, edge, and tiny ML paradigms. Cloud-based machine learning leverages the immense computational resources of cloud platforms to enable powerful and accurate models but comes with limitations, including latency and privacy concerns. Edge ML mitigates these limitations by bringing inference directly to edge devices, offering lower latency and reduced connectivity needs. TinyML takes this further by miniaturizing ML models to run directly on highly resource-constrained devices, opening up a new category of intelligent applications.\nEach approach has its tradeoffs, including model complexity, latency, privacy, and hardware costs. Over time, we anticipate converging these embedded ML approaches, with cloud pre-training facilitating more sophisticated edge and tiny ML implementations. Advances like federated learning and on-device learning will enable embedded devices to refine their models by learning from real-world data.\nThe embedded ML landscape is rapidly evolving and poised to enable intelligent applications across a broad spectrum of devices and use cases. This chapter serves as a snapshot of the current state of embedded ML. As algorithms, hardware, and connectivity continue to improve, we can expect embedded devices of all sizes to become increasingly capable, unlocking transformative new applications for artificial intelligence.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -284,9 +284,9 @@ "href": "contents/ml_systems/ml_systems.html#sec-ml-systems-resource", "title": "2  ML Systems", "section": "2.7 Resources", - "text": "2.7 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nEmbedded Systems Overview.\nEmbedded Computer Hardware.\nEmbedded I/O.\nEmbedded systems software.\nEmbedded ML software.\nEmbedded Inference.\nTinyML on Microcontrollers.\nTinyML as a Service (TinyMLaaS):\n\nTinyMLaaS: Introduction.\nTinyMLaaS: Design Overview.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "2.7 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nEmbedded Systems Overview.\nEmbedded Computer Hardware.\nEmbedded I/O.\nEmbedded systems software.\nEmbedded ML software.\nEmbedded Inference.\nTinyML on Microcontrollers.\nTinyML as a Service (TinyMLaaS):\n\nTinyMLaaS: Introduction.\nTinyMLaaS: Design Overview.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Fundamentals", "2  ML Systems" ] }, @@ -297,7 +297,7 @@ "section": "", "text": "3.1 Introduction", "crumbs": [ - "MAIN", + "Fundamentals", "3  DL Primer" ] }, @@ -308,7 +308,7 @@ "section": "", "text": "3.1.1 Definition and Importance\nDeep learning, a specialized area within machine learning and artificial intelligence (AI), utilizes algorithms modeled after the structure and function of the human brain, known as artificial neural networks. This field is a foundational element in AI, driving progress in diverse sectors such as computer vision, natural language processing, and self-driving vehicles. Its significance in embedded AI systems is highlighted by its capability to handle intricate calculations and predictions, optimizing the limited resources in embedded settings. Figure 3.1 illustrates the chronological development and relative segmentation of the three fields.\n\n\n\n\n\n\nFigure 3.1: Artificial intelligence subfields. Credit: NVIDIA.\n\n\n\n\n\n3.1.2 Brief History of Deep Learning\nThe idea of deep learning has origins in early artificial neural networks. It has experienced several cycles of interest, starting with the introduction of the Perceptron in the 1950s (Rosenblatt 1957), followed by the invention of backpropagation algorithms in the 1980s (Rumelhart, Hinton, and Williams 1986).\n\nRosenblatt, Frank. 1957. The Perceptron, a Perceiving and Recognizing Automaton Project Para. Cornell Aeronautical Laboratory.\n\nRumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36. https://doi.org/10.1038/323533a0.\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, 1106–14. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.\nThe term “deep learning” became prominent in the 2000s, characterized by advances in computational power and data accessibility. Important milestones include the successful training of deep networks like AlexNet (Krizhevsky, Sutskever, and Hinton 2012) by Geoffrey Hinton, a leading figure in AI, and the renewed focus on neural networks as effective tools for data analysis and modeling.\nDeep learning has recently seen exponential growth, transforming various industries. Computational growth followed an 18-month doubling pattern from 1952 to 2010, which then accelerated to a 6-month cycle from 2010 to 2022, as shown in Figure 3.2. Concurrently, we saw the emergence of large-scale models between 2015 and 2022, appearing 2 to 3 orders of magnitude faster and following a 10-month doubling cycle.\n\n\n\n\n\n\nFigure 3.2: Growth of deep learning models.\n\n\n\nMultiple factors have contributed to this surge, including advancements in computational power, the abundance of big data, and improvements in algorithmic designs. First, the growth of computational capabilities, especially the arrival of Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) (Jouppi et al. 2017), has significantly sped up the training and inference times of deep learning models. These hardware improvements have enabled the construction and training of more complex, deeper networks than what was possible in earlier years.\n\nJouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et al. 2017. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” In Proceedings of the 44th Annual International Symposium on Computer Architecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\nSecond, the digital revolution has yielded a wealth of big data, offering rich material for deep learning models to learn from and excel in tasks such as image and speech recognition, language translation, and game playing. Large, labeled datasets have been key in refining and successfully deploying deep learning applications in real-world settings.\nAdditionally, collaborations and open-source efforts have nurtured a dynamic community of researchers and practitioners, accelerating advancements in deep learning techniques. Innovations like deep reinforcement learning, transfer learning, and generative adversarial networks have broadened the scope of what is achievable with deep learning, opening new possibilities in various sectors, including healthcare, finance, transportation, and entertainment.\nOrganizations worldwide recognize deep learning’s transformative potential and invest heavily in research and development to leverage its capabilities in providing innovative solutions, optimizing operations, and creating new business opportunities. As deep learning continues its upward trajectory, it is set to redefine how we interact with technology, enhancing convenience, safety, and connectivity in our lives.\n\n\n3.1.3 Applications of Deep Learning\nDeep learning is extensively used across numerous industries today. In finance, it is employed for stock market prediction, risk assessment, and fraud detection. Marketing uses it for customer segmentation, personalization, and content optimization. In healthcare, machine learning aids in diagnosis, treatment planning, and patient monitoring. The transformative impact on society is evident.\nFor instance, deep learning algorithms can predict stock market trends, guide investment strategies, and enhance financial decisions. Similarly, in healthcare, deep learning can make medical predictions that improve patient diagnosis and save lives. The benefits are clear: machine learning predicts with greater accuracy than humans and does so much more quickly.\nIn manufacturing, deep learning has had a significant impact. By continuously learning from vast amounts of data collected during manufacturing, companies can boost productivity while minimizing waste through improved efficiency. This financial benefit for companies translates to better quality products at lower customer prices. Machine learning enables manufacturers to continually refine their processes, producing higher quality goods more efficiently than ever.\nDeep learning enhances everyday products like Netflix recommendations and Google Translate text translations. Moreover, it helps companies like Amazon and Uber reduce customer service costs by swiftly identifying dissatisfied customers.\n\n\n3.1.4 Relevance to Embedded AI\nEmbedded AI, the integration of AI algorithms directly into hardware devices, naturally gains from deep learning capabilities. Combining deep learning algorithms and embedded systems has laid the groundwork for intelligent, autonomous devices capable of advanced on-device data processing and analysis. Deep learning aids in extracting complex patterns and information from input data, which is essential in developing smart embedded systems, from household appliances to industrial machinery. This collaboration aims to usher in a new era of intelligent, interconnected devices that can learn and adapt to user behavior and environmental conditions, optimizing performance and offering unprecedented convenience and efficiency.", "crumbs": [ - "MAIN", + "Fundamentals", "3  DL Primer" ] }, @@ -317,9 +317,9 @@ "href": "contents/dl_primer/dl_primer.html#neural-networks", "title": "3  DL Primer", "section": "3.2 Neural Networks", - "text": "3.2 Neural Networks\nDeep learning draws inspiration from the human brain’s neural networks to create decision-making patterns. This section delves into the foundational concepts of deep learning, providing insights into the more complex topics discussed later in this primer.\nNeural networks serve as the foundation of deep learning, inspired by the biological neural networks in the human brain to process and analyze data hierarchically. Below, we examine the primary components and structures in neural networks.\n\n3.2.1 Perceptrons\nThe Perceptron is the basic unit or node that is the foundation for more complex structures. It takes various inputs, applies weights and biases to them, and then uses an activation function to produce an output. Figure 3.3 illustrates the building blocks of a perceptron. In simple terms, think of a perceptron as a tiny decision-maker that learns to make a binary decision (e.g. ‘yes’ or ‘no’). It takes in numbers as inputs (x_1, x_2, ...), each representing a feature of an object we wish to analyze (an image for example). Then it multiplies each input by a weight, adds them up, and if the total is high enough (crosses a certain threshold), it returns “yes” as an asnwer, otherwise, it outputs “no.”\n\n\n\n\n\n\nFigure 3.3: Perceptron. Credit: Wikimedia - Chrislb.\n\n\n\nConceived in the 1950s, perceptrons paved the way for developing more intricate neural networks and have been a fundamental building block in deep learning.\n\n\n3.2.2 Multilayer Perceptrons\nMultilayer perceptrons (MLPs) are an evolution of the single-layer perceptron model, featuring multiple layers of nodes connected in a feedforward manner, as shown in Figure 3.4. These layers include an input layer for data reception, several hidden layers for data processing, and an output layer for final result generation. MLPs are skilled at identifying non-linear relationships and use a backpropagation technique for training, where weights are optimized through a gradient descent algorithm.\n\n\n\n\n\n\nFigure 3.4: Multilayer Perceptron. Credit: Wikimedia - Charlie.\n\n\n\n\nForward Pass\nThe forward pass is the initial phase where data moves through the network from the input to the output layer. During this phase, each layer performs specific computations on the input data, using weights and biases before passing the resulting values to subsequent layers. The final output of this phase is used to compute the loss, indicating the difference between the predicted output and actual target values.\nThe video below explains how neural networks work using handwritten digit recognition as an example application. It also touches on the math underlying neural nets.\n\n\n\nBackward Pass (Backpropagation)\nBackpropagation is a key algorithm in training deep neural networks. This phase involves calculating the gradient of the loss function concerning each weight using the chain rule, effectively moving backward through the network. The gradients calculated in this step guide the adjustment of weights to minimize the loss function, thereby enhancing the network’s performance with each iteration of training.\nGrasping these foundational concepts paves the way to understanding more intricate deep learning architectures and techniques, fostering the development of more sophisticated and productive applications, especially within embedded AI systems.\nThe following two videos build upon the previous one. They cover gradient descent and backpropagation in neural networks.\n\n\n\n\n\n3.2.3 Model Architectures\nDeep learning architectures refer to the various structured approaches that dictate how neurons and layers are organized and interact in neural networks. These architectures have evolved to tackle different problems and data types effectively. This section overviews some well-known deep learning architectures and their characteristics.\n\nMultilayer Perceptrons (MLPs)\nMLPs are basic deep learning architectures comprising three layers: an input layer, one or more hidden layers, and an output layer. These layers are fully connected, meaning each neuron in a layer is linked to every neuron in the preceding and following layers. MLPs can model intricate functions and are used in various tasks, such as regression, classification, and pattern recognition. Their capacity to learn non-linear relationships through backpropagation makes them a versatile instrument in the deep learning toolkit.\nIn embedded AI systems, MLPs can function as compact models for simpler tasks like sensor data analysis or basic pattern recognition, where computational resources are limited. Their ability to learn non-linear relationships with relatively less complexity makes them a suitable choice for embedded systems.\n\n\n\n\n\n\nExercise 3.1: Multilayer Perceptrons (MLPs)\n\n\n\n\n\nWe’ve just scratched the surface of neural networks. Now, you’ll get to try and apply these concepts in practical examples. In the provided Colab notebooks, you’ll explore:\nPredicting house prices: Learn how neural networks can analyze housing data to estimate property values.   \nImage Classification: Discover how to build a network to understand the famous MNIST handwritten digit dataset.   \nReal-world medical diagnosis: Use deep learning to tackle the important task of breast cancer classification.   \n\n\n\n\n\nConvolutional Neural Networks (CNNs)\nCNNs are mainly used in image and video recognition tasks. This architecture employs convolutional layers that filter input data to identify features like edges, corners, and textures. A typical CNN also includes pooling layers to reduce the spatial dimensions of the data and fully connected layers for classification. CNNs have proven highly effective in image recognition, object detection, and computer vision applications.\nIn embedded AI, CNNs are crucial for image and video recognition tasks, where real-time processing is often needed. They can be optimized for embedded systems using techniques like quantization and pruning to minimize memory usage and computational demands, enabling efficient object detection and facial recognition functionalities in devices with limited computational resources.\n\n\n\n\n\n\nExercise 3.2: Convolutional Neural Networks (CNNs)\n\n\n\n\n\nWe discussed that CNNs excel at identifying image features, making them ideal for tasks like object classification. Now, you’ll get to put this knowledge into action! This Colab notebook focuses on building a CNN to classify images from the CIFAR-10 dataset, which includes objects like airplanes, cars, and animals. You’ll learn about the key differences between CIFAR-10 and the MNIST dataset we explored earlier and how these differences influence model choice. By the end of this notebook, you’ll have a grasp of CNNs for image recognition and be well on your way to becoming a TinyML expert!      \n\n\n\n\n\nRecurrent Neural Networks (RNNs)\nRNNs are suitable for sequential data analysis, like time series forecasting and natural language processing. In this architecture, connections between nodes form a directed graph along a temporal sequence, allowing information to be carried across sequences through hidden state vectors. Variants of RNNs include Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), designed to capture longer dependencies in sequence data.\nThese networks can be used in voice recognition systems, predictive maintenance, or IoT devices where sequential data patterns are common. Optimizations specific to embedded platforms can assist in managing their typically high computational and memory requirements.\n\n\nGenerative Adversarial Networks (GANs)\nGANs consist of two networks, a generator and a discriminator, trained simultaneously through adversarial training (Goodfellow et al. 2020). The generator produces data that tries to mimic the real data distribution, while the discriminator aims to distinguish between real and generated data. GANs are widely used in image generation, style transfer, and data augmentation.\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\nIn embedded settings, GANs could be used for on-device data augmentation to enhance the training of models directly on the embedded device, enabling continual learning and adaptation to new data without the need for cloud computing resources.\n\n\nAutoencoders\nAutoencoders are neural networks for data compression and noise reduction (Bank, Koenigstein, and Giryes 2023). They are structured to encode input data into a lower-dimensional representation and then decode it back to its original form. Variants like Variational Autoencoders (VAEs) introduce probabilistic layers that allow for generative properties, finding applications in image generation and anomaly detection.\n\nBank, Dor, Noam Koenigstein, and Raja Giryes. 2023. “Autoencoders.” Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, 353–74.\nUsing autoencoders can help in efficient data transmission and storage, improving the overall performance of embedded systems with limited computational and memory resources.\n\n\nTransformer Networks\nTransformer networks have emerged as a powerful architecture, especially in natural language processing (Vaswani et al. 2017). These networks use self-attention mechanisms to weigh the influence of different input words on each output word, enabling parallel computation and capturing intricate patterns in data. Transformer networks have led to state-of-the-art results in tasks like language translation, summarization, and text generation.\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.\nThese networks can be optimized to perform language-related tasks directly on the device. For example, transformers can be used in embedded systems for real-time translation services or voice-assisted interfaces, where latency and computational efficiency are crucial. Techniques such as model distillation can be employed to deploy these networks on embedded devices with limited resources.\nThese architectures serve specific purposes and excel in different domains, offering a rich toolkit for addressing diverse problems in embedded AI systems. Understanding the nuances of these architectures is crucial in designing effective and efficient deep learning models for various applications.\n\n\n\n3.2.4 Traditional ML vs Deep Learning\nTo briefly highlight the differences, Table 3.1 illustrates the contrasting characteristics between traditional ML and deep learning:\n\n\n\nTable 3.1: Comparison of traditional machine learning and deep learning.\n\n\n\n\n\n\n\n\n\n\nAspect\nTraditional ML\nDeep Learning\n\n\n\n\nData Requirements\nLow to Moderate (efficient with smaller datasets)\nHigh (requires large datasets for nuanced learning)\n\n\nModel Complexity\nModerate (suitable for well-defined problems)\nHigh (detects intricate patterns, suited for complex tasks)\n\n\nComputational Resources\nLow to Moderate (cost-effective, less resource-intensive)\nHigh (demands substantial computational power and resources)\n\n\nDeployment Speed\nFast (quicker training and deployment cycles)\nSlow (prolonged training times, especially with larger datasets)\n\n\nInterpretability\nHigh (clear insights into decision pathways)\nLow (complex layered structures, “black box” nature)\n\n\nMaintenance\nEasier (simple to update and maintain)\nComplex (requires more efforts in maintenance and updates)\n\n\n\n\n\n\n\n\n3.2.5 Choosing Traditional ML vs. DL\n\nData Availability and Volume\n* Amount of Data: Traditional machine learning algorithms, such as decision trees or Naive Bayes, are often more suitable when data availability is limited. They offer robust predictions even with smaller datasets. This is particularly true in medical diagnostics for disease prediction and customer segmentation in marketing.\n* Data Diversity and Quality: Traditional machine learning algorithms often work well with structured data (the input to the model is a set of features, ideally independent of each other) but may require significant preprocessing effort (i.e., feature engineering). On the other hand, deep learning takes the approach of automatically performing feature engineering as part of the model architecture. This approach enables the construction of end-to-end models capable of directly mapping from unstructured input data (such as text, audio, and images) to the desired output without relying on simplistic heuristics that have limited effectiveness. However, this results in larger models demanding more data and computational resources. In noisy data, the necessity for larger datasets is further emphasized when utilizing Deep Learning.\n\n\nComplexity of the Problem\n Problem Granularity: Problems that are simple to moderately complex, which may involve linear or polynomial relationships between variables, often find a better fit with traditional machine learning methods.     Hierarchical Feature Representation: Deep learning models are excellent in tasks that require hierarchical feature representation, such as image and speech recognition. However, not all problems require this complexity, and traditional machine learning algorithms may sometimes offer simpler and equally effective solutions.\n\n\nHardware and Computational Resources\n Resource Constraints: The availability of computational resources often influences the choice between traditional ML and deep learning. The former is generally less resource-intensive and thus preferable in environments with hardware limitations or budget constraints.     Scalability and Speed: Traditional machine learning algorithms, like support vector machines (SVM), often allow for faster training times and easier scalability, which is particularly beneficial in projects with tight timelines and growing data volumes.\n\n\nRegulatory Compliance\nRegulatory compliance is crucial in various industries, requiring adherence to guidelines and best practices such as the GDPR in the EU. Traditional ML models, due to their inherent interpretability, often align better with these regulations, especially in sectors like finance and healthcare.\n\n\nInterpretability\nUnderstanding the decision-making process is easier with traditional machine learning techniques than deep learning models, which function as “black boxes,” making it challenging to trace decision pathways.\n\n\n\n3.2.6 Making an Informed Choice\nGiven the constraints of embedded AI systems, understanding the differences between traditional ML techniques and deep learning becomes essential. Both avenues offer unique advantages, and their distinct characteristics often dictate the choice of one over the other in different scenarios.\nDespite this, deep learning has steadily outperformed traditional machine learning methods in several key areas due to abundant data, computational advancements, and proven effectiveness in complex tasks.\nHere are some specific reasons why we focus on deep learning in this text:\n1. Superior Performance in Complex Tasks: Deep learning models, particularly deep neural networks, excel in tasks where the relationships between data points are incredibly intricate. Tasks like image and speech recognition, language translation, and playing complex games like Go and Chess have seen significant advancements primarily through deep learning algorithms.\n2. Efficient Handling of Unstructured Data: Unlike traditional machine learning methods, deep learning can more effectively process unstructured data. This is crucial in today’s data landscape, where the vast majority of data, such as text, images, and videos, is unstructured.\n3. Leveraging Big Data: With the availability of big data, deep learning models can learn and improve continually. These models excel at utilizing large datasets to enhance their predictive accuracy, a limitation in traditional machine-learning approaches.\n4. Hardware Advancements and Parallel Computing: The advent of powerful GPUs and the availability of cloud computing platforms have enabled the rapid training of deep learning models. These advancements have addressed one of deep learning’s significant challenges: the need for substantial computational resources.\n5. Dynamic Adaptability and Continuous Learning: Deep learning models can dynamically adapt to new information or data. They can be trained to generalize their learning to new, unseen data, crucial in rapidly evolving fields like autonomous driving or real-time language translation.\nWhile deep learning has gained significant traction, it’s essential to understand that traditional machine learning is still relevant. As we delve deeper into the intricacies of deep learning, we will also highlight situations where traditional machine learning methods may be more appropriate due to their simplicity, efficiency, and interpretability. By focusing on deep learning in this text, we aim to equip readers with the knowledge and tools to tackle modern, complex problems across various domains while also providing insights into the comparative advantages and appropriate application scenarios for deep learning and traditional machine learning techniques.", + "text": "3.2 Neural Networks\nDeep learning draws inspiration from the human brain’s neural networks to create decision-making patterns. This section delves into the foundational concepts of deep learning, providing insights into the more complex topics discussed later in this primer.\nNeural networks serve as the foundation of deep learning, inspired by the biological neural networks in the human brain to process and analyze data hierarchically. Below, we examine the primary components and structures in neural networks.\n\n3.2.1 Perceptrons\nThe Perceptron is the basic unit or node that is the foundation for more complex structures. It takes various inputs, applies weights and biases to them, and then uses an activation function to produce an output. Figure 3.3 illustrates the building blocks of a perceptron. In simple terms, think of a perceptron as a tiny decision-maker that learns to make a binary decision (e.g. ‘yes’ or ‘no’). It takes in numbers as inputs (x_1, x_2, ...), each representing a feature of an object we wish to analyze (an image for example). Then it multiplies each input by a weight, adds them up, and if the total is high enough (crosses a certain threshold), it returns “yes” as an asnwer, otherwise, it outputs “no.”\n\n\n\n\n\n\nFigure 3.3: Perceptron. Credit: Wikimedia - Chrislb.\n\n\n\nConceived in the 1950s, perceptrons paved the way for developing more intricate neural networks and have been a fundamental building block in deep learning.\n\n\n3.2.2 Multilayer Perceptrons\nMultilayer perceptrons (MLPs) are an evolution of the single-layer perceptron model, featuring multiple layers of nodes connected in a feedforward manner, as shown in Figure 3.4. These layers include an input layer for data reception, several hidden layers for data processing, and an output layer for final result generation. MLPs are skilled at identifying non-linear relationships and use a backpropagation technique for training, where weights are optimized through a gradient descent algorithm.\n\n\n\n\n\n\nFigure 3.4: Multilayer Perceptron. Credit: Wikimedia - Charlie.\n\n\n\n\nForward Pass\nThe forward pass is the initial phase where data moves through the network from the input to the output layer. During this phase, each layer performs specific computations on the input data, using weights and biases before passing the resulting values to subsequent layers. The final output of this phase is used to compute the loss, indicating the difference between the predicted output and actual target values.\nVideo 3.1 below explains how neural networks work using handwritten digit recognition as an example application. It also touches on the math underlying neural nets.\n\n\n\n\n\n\nVideo 3.1: Neural Networks\n\n\n\n\n\n\n\n\nBackward Pass (Backpropagation)\nBackpropagation is a key algorithm in training deep neural networks. This phase involves calculating the gradient of the loss function concerning each weight using the chain rule, effectively moving backward through the network. The gradients calculated in this step guide the adjustment of weights to minimize the loss function, thereby enhancing the network’s performance with each iteration of training.\nGrasping these foundational concepts paves the way to understanding more intricate deep learning architectures and techniques, fostering the development of more sophisticated and productive applications, especially within embedded AI systems.\nVideo 3.2 and Video 3.3 build upon Video 3.1. They cover gradient descent and backpropagation in neural networks.\n\n\n\n\n\n\nVideo 3.2: Gradient descent\n\n\n\n\n\n\n\n\n\n\n\n\nVideo 3.3: Backpropagation\n\n\n\n\n\n\n\n\n\n3.2.3 Model Architectures\nDeep learning architectures refer to the various structured approaches that dictate how neurons and layers are organized and interact in neural networks. These architectures have evolved to tackle different problems and data types effectively. This section overviews some well-known deep learning architectures and their characteristics.\n\nMultilayer Perceptrons (MLPs)\nMLPs are basic deep learning architectures comprising three layers: an input layer, one or more hidden layers, and an output layer. These layers are fully connected, meaning each neuron in a layer is linked to every neuron in the preceding and following layers. MLPs can model intricate functions and are used in various tasks, such as regression, classification, and pattern recognition. Their capacity to learn non-linear relationships through backpropagation makes them a versatile instrument in the deep learning toolkit.\nIn embedded AI systems, MLPs can function as compact models for simpler tasks like sensor data analysis or basic pattern recognition, where computational resources are limited. Their ability to learn non-linear relationships with relatively less complexity makes them a suitable choice for embedded systems.\n\n\n\n\n\n\nExercise 3.1: Multilayer Perceptrons (MLPs)\n\n\n\n\n\nWe’ve just scratched the surface of neural networks. Now, you’ll get to try and apply these concepts in practical examples. In the provided Colab notebooks, you’ll explore:\nPredicting house prices: Learn how neural networks can analyze housing data to estimate property values.   \nImage Classification: Discover how to build a network to understand the famous MNIST handwritten digit dataset.   \nReal-world medical diagnosis: Use deep learning to tackle the important task of breast cancer classification.   \n\n\n\n\n\nConvolutional Neural Networks (CNNs)\nCNNs are mainly used in image and video recognition tasks. This architecture employs convolutional layers that filter input data to identify features like edges, corners, and textures. A typical CNN also includes pooling layers to reduce the spatial dimensions of the data and fully connected layers for classification. CNNs have proven highly effective in image recognition, object detection, and computer vision applications.\nIn embedded AI, CNNs are crucial for image and video recognition tasks, where real-time processing is often needed. They can be optimized for embedded systems using techniques like quantization and pruning to minimize memory usage and computational demands, enabling efficient object detection and facial recognition functionalities in devices with limited computational resources.\n\n\n\n\n\n\nExercise 3.2: Convolutional Neural Networks (CNNs)\n\n\n\n\n\nWe discussed that CNNs excel at identifying image features, making them ideal for tasks like object classification. Now, you’ll get to put this knowledge into action! This Colab notebook focuses on building a CNN to classify images from the CIFAR-10 dataset, which includes objects like airplanes, cars, and animals. You’ll learn about the key differences between CIFAR-10 and the MNIST dataset we explored earlier and how these differences influence model choice. By the end of this notebook, you’ll have a grasp of CNNs for image recognition and be well on your way to becoming a TinyML expert!      \n\n\n\n\n\nRecurrent Neural Networks (RNNs)\nRNNs are suitable for sequential data analysis, like time series forecasting and natural language processing. In this architecture, connections between nodes form a directed graph along a temporal sequence, allowing information to be carried across sequences through hidden state vectors. Variants of RNNs include Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), designed to capture longer dependencies in sequence data.\nThese networks can be used in voice recognition systems, predictive maintenance, or IoT devices where sequential data patterns are common. Optimizations specific to embedded platforms can assist in managing their typically high computational and memory requirements.\n\n\nGenerative Adversarial Networks (GANs)\nGANs consist of two networks, a generator and a discriminator, trained simultaneously through adversarial training (Goodfellow et al. 2020). The generator produces data that tries to mimic the real data distribution, while the discriminator aims to distinguish between real and generated data. GANs are widely used in image generation, style transfer, and data augmentation.\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\nIn embedded settings, GANs could be used for on-device data augmentation to enhance the training of models directly on the embedded device, enabling continual learning and adaptation to new data without the need for cloud computing resources.\n\n\nAutoencoders\nAutoencoders are neural networks for data compression and noise reduction (Bank, Koenigstein, and Giryes 2023). They are structured to encode input data into a lower-dimensional representation and then decode it back to its original form. Variants like Variational Autoencoders (VAEs) introduce probabilistic layers that allow for generative properties, finding applications in image generation and anomaly detection.\n\nBank, Dor, Noam Koenigstein, and Raja Giryes. 2023. “Autoencoders.” Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, 353–74.\nUsing autoencoders can help in efficient data transmission and storage, improving the overall performance of embedded systems with limited computational and memory resources.\n\n\nTransformer Networks\nTransformer networks have emerged as a powerful architecture, especially in natural language processing (Vaswani et al. 2017). These networks use self-attention mechanisms to weigh the influence of different input words on each output word, enabling parallel computation and capturing intricate patterns in data. Transformer networks have led to state-of-the-art results in tasks like language translation, summarization, and text generation.\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.\nThese networks can be optimized to perform language-related tasks directly on the device. For example, transformers can be used in embedded systems for real-time translation services or voice-assisted interfaces, where latency and computational efficiency are crucial. Techniques such as model distillation can be employed to deploy these networks on embedded devices with limited resources.\nThese architectures serve specific purposes and excel in different domains, offering a rich toolkit for addressing diverse problems in embedded AI systems. Understanding the nuances of these architectures is crucial in designing effective and efficient deep learning models for various applications.\n\n\n\n3.2.4 Traditional ML vs Deep Learning\nTo briefly highlight the differences, Table 3.1 illustrates the contrasting characteristics between traditional ML and deep learning:\n\n\n\nTable 3.1: Comparison of traditional machine learning and deep learning.\n\n\n\n\n\n\n\n\n\n\nAspect\nTraditional ML\nDeep Learning\n\n\n\n\nData Requirements\nLow to Moderate (efficient with smaller datasets)\nHigh (requires large datasets for nuanced learning)\n\n\nModel Complexity\nModerate (suitable for well-defined problems)\nHigh (detects intricate patterns, suited for complex tasks)\n\n\nComputational Resources\nLow to Moderate (cost-effective, less resource-intensive)\nHigh (demands substantial computational power and resources)\n\n\nDeployment Speed\nFast (quicker training and deployment cycles)\nSlow (prolonged training times, especially with larger datasets)\n\n\nInterpretability\nHigh (clear insights into decision pathways)\nLow (complex layered structures, “black box” nature)\n\n\nMaintenance\nEasier (simple to update and maintain)\nComplex (requires more efforts in maintenance and updates)\n\n\n\n\n\n\n\n\n3.2.5 Choosing Traditional ML vs. DL\n\nData Availability and Volume\n* Amount of Data: Traditional machine learning algorithms, such as decision trees or Naive Bayes, are often more suitable when data availability is limited. They offer robust predictions even with smaller datasets. This is particularly true in medical diagnostics for disease prediction and customer segmentation in marketing.\n* Data Diversity and Quality: Traditional machine learning algorithms often work well with structured data (the input to the model is a set of features, ideally independent of each other) but may require significant preprocessing effort (i.e., feature engineering). On the other hand, deep learning takes the approach of automatically performing feature engineering as part of the model architecture. This approach enables the construction of end-to-end models capable of directly mapping from unstructured input data (such as text, audio, and images) to the desired output without relying on simplistic heuristics that have limited effectiveness. However, this results in larger models demanding more data and computational resources. In noisy data, the necessity for larger datasets is further emphasized when utilizing Deep Learning.\n\n\nComplexity of the Problem\n Problem Granularity: Problems that are simple to moderately complex, which may involve linear or polynomial relationships between variables, often find a better fit with traditional machine learning methods.     Hierarchical Feature Representation: Deep learning models are excellent in tasks that require hierarchical feature representation, such as image and speech recognition. However, not all problems require this complexity, and traditional machine learning algorithms may sometimes offer simpler and equally effective solutions.\n\n\nHardware and Computational Resources\n Resource Constraints: The availability of computational resources often influences the choice between traditional ML and deep learning. The former is generally less resource-intensive and thus preferable in environments with hardware limitations or budget constraints.     Scalability and Speed: Traditional machine learning algorithms, like support vector machines (SVM), often allow for faster training times and easier scalability, which is particularly beneficial in projects with tight timelines and growing data volumes.\n\n\nRegulatory Compliance\nRegulatory compliance is crucial in various industries, requiring adherence to guidelines and best practices such as the GDPR in the EU. Traditional ML models, due to their inherent interpretability, often align better with these regulations, especially in sectors like finance and healthcare.\n\n\nInterpretability\nUnderstanding the decision-making process is easier with traditional machine learning techniques than deep learning models, which function as “black boxes,” making it challenging to trace decision pathways.\n\n\n\n3.2.6 Making an Informed Choice\nGiven the constraints of embedded AI systems, understanding the differences between traditional ML techniques and deep learning becomes essential. Both avenues offer unique advantages, and their distinct characteristics often dictate the choice of one over the other in different scenarios.\nDespite this, deep learning has steadily outperformed traditional machine learning methods in several key areas due to abundant data, computational advancements, and proven effectiveness in complex tasks.\nHere are some specific reasons why we focus on deep learning in this text:\n1. Superior Performance in Complex Tasks: Deep learning models, particularly deep neural networks, excel in tasks where the relationships between data points are incredibly intricate. Tasks like image and speech recognition, language translation, and playing complex games like Go and Chess have seen significant advancements primarily through deep learning algorithms.\n2. Efficient Handling of Unstructured Data: Unlike traditional machine learning methods, deep learning can more effectively process unstructured data. This is crucial in today’s data landscape, where the vast majority of data, such as text, images, and videos, is unstructured.\n3. Leveraging Big Data: With the availability of big data, deep learning models can learn and improve continually. These models excel at utilizing large datasets to enhance their predictive accuracy, a limitation in traditional machine-learning approaches.\n4. Hardware Advancements and Parallel Computing: The advent of powerful GPUs and the availability of cloud computing platforms have enabled the rapid training of deep learning models. These advancements have addressed one of deep learning’s significant challenges: the need for substantial computational resources.\n5. Dynamic Adaptability and Continuous Learning: Deep learning models can dynamically adapt to new information or data. They can be trained to generalize their learning to new, unseen data, crucial in rapidly evolving fields like autonomous driving or real-time language translation.\nWhile deep learning has gained significant traction, it’s essential to understand that traditional machine learning is still relevant. As we delve deeper into the intricacies of deep learning, we will also highlight situations where traditional machine learning methods may be more appropriate due to their simplicity, efficiency, and interpretability. By focusing on deep learning in this text, we aim to equip readers with the knowledge and tools to tackle modern, complex problems across various domains while also providing insights into the comparative advantages and appropriate application scenarios for deep learning and traditional machine learning techniques.", "crumbs": [ - "MAIN", + "Fundamentals", "3  DL Primer" ] }, @@ -330,7 +330,7 @@ "section": "3.3 Conclusion", "text": "3.3 Conclusion\nDeep learning has become a potent set of techniques for addressing intricate pattern recognition and prediction challenges. Starting with an overview, we outlined the fundamental concepts and principles governing deep learning, laying the groundwork for more advanced studies.\nCentral to deep learning, we explored the basic ideas of neural networks, powerful computational models inspired by the human brain’s interconnected neuron structure. This exploration allowed us to appreciate neural networks’ capabilities and potential in creating sophisticated algorithms capable of learning and adapting from data.\nUnderstanding the role of libraries and frameworks was a key part of our discussion. We offered insights into the tools that can facilitate developing and deploying deep learning models. These resources ease the implementation of neural networks and open avenues for innovation and optimization.\nNext, we tackled the challenges one might face when embedding deep learning algorithms within embedded systems, providing a critical perspective on the complexities and considerations of bringing AI to edge devices.\nFurthermore, we examined deep learning’s limitations. Through discussions, we unraveled the challenges faced in deep learning applications and outlined scenarios where traditional machine learning might outperform deep learning. These sections are crucial for fostering a balanced view of deep learning’s capabilities and limitations.\nIn this primer, we have equipped you with the knowledge to make informed choices between deploying traditional machine learning or deep learning techniques, depending on the unique demands and constraints of a specific problem.\nAs we conclude this chapter, we hope you are now well-equipped with the basic “language” of deep learning and prepared to delve deeper into the subsequent chapters with a solid understanding and critical perspective. The journey ahead is filled with exciting opportunities and challenges in embedding AI within systems.", "crumbs": [ - "MAIN", + "Fundamentals", "3  DL Primer" ] }, @@ -339,9 +339,9 @@ "href": "contents/dl_primer/dl_primer.html#sec-deep-learning-primer-resource", "title": "3  DL Primer", "section": "3.4 Resources", - "text": "3.4 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nPast, Present, and Future of ML.\nThinking About Loss.\nMinimizing Loss.\nFirst Neural Network.\nUnderstanding Neurons.\nIntro to CLassification.\nTraining, Validation, and Test Data.\nIntro to Convolutions.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 3.1\nExercise 3.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\n\nComing soon.", + "text": "3.4 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nPast, Present, and Future of ML.\nThinking About Loss.\nMinimizing Loss.\nFirst Neural Network.\nUnderstanding Neurons.\nIntro to CLassification.\nTraining, Validation, and Test Data.\nIntro to Convolutions.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 3.1\nVideo 3.2\nVideo 3.3\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 3.1\nExercise 3.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\n\nComing soon.", "crumbs": [ - "MAIN", + "Fundamentals", "3  DL Primer" ] }, @@ -352,7 +352,7 @@ "section": "", "text": "4.1 Overview\nDeveloping a successful machine learning model requires a systematic workflow. This end-to-end process enables you to build, deploy, and maintain models effectively. As shown in Figure 4.1, It typically involves the following key steps:\nFollowing this structured ML workflow helps guide you through the key phases of development. It ensures you build effective and robust models ready for real-world deployment, resulting in higher-quality models that solve your business needs.\nThe ML workflow is iterative, requiring ongoing monitoring and potential adjustments. Additional considerations include:", "crumbs": [ - "MAIN", + "Workflow", "4  AI Workflow" ] }, @@ -363,7 +363,7 @@ "section": "", "text": "Figure 4.1: Multi-step design methodology for the development of a machine learning model. Commonly referred to as the machine learning lifecycle\n\n\n\n\n\nProblem Definition - Start by clearly articulating the specific problem you want to solve. This focuses on your efforts during data collection and model building.\nData Collection to Preparation: Gather relevant, high-quality training data that captures all aspects of the problem. Clean and preprocess the data to prepare it for modeling.\nModel Selection and Training: Choose a machine learning algorithm suited to your problem type and data. Consider the pros and cons of different approaches. Feed the prepared data into the model to train it. Training time varies based on data size and model complexity.\nModel Evaluation: Test the trained model on new unseen data to measure its predictive accuracy. Identify any limitations.\nModel Deployment: Integrate the validated model into applications or systems to start operationalization.\nMonitor and Maintain: Track model performance in production. Retrain periodically on new data to keep it current.\n\n\n\n\nVersion Control: Track code and data changes to reproduce results and revert to earlier versions if needed.\nDocumentation: Maintain detailed documentation for workflow understanding and reproduction.\nTesting: Rigorously test the workflow to ensure its functionality.\nSecurity: Safeguard your workflow and data when deploying models in production settings.", "crumbs": [ - "MAIN", + "Workflow", "4  AI Workflow" ] }, @@ -374,7 +374,7 @@ "section": "4.2 Traditional vs. Embedded AI", "text": "4.2 Traditional vs. Embedded AI\nThe ML workflow is a universal guide applicable across various platforms, including cloud-based solutions, edge computing, and TinyML. However, the workflow for Embedded AI introduces unique complexities and challenges, making it a captivating domain and paving the way for remarkable innovations.\n\n4.2.1 Resource Optimization\n\nTraditional ML Workflow: This workflow prioritizes model accuracy and performance, often leveraging abundant computational resources in cloud or data center environments.\nEmbedded AI Workflow: Given embedded systems’ resource constraints, this workflow requires careful planning to optimize model size and computational demands. Techniques like model quantization and pruning are crucial.\n\n\n\n4.2.2 Real-time Processing\n\nTraditional ML Workflow: Less emphasis on real-time processing, often relying on batch data processing.\nEmbedded AI Workflow: Prioritizes real-time data processing, making low latency and quick execution essential, especially in applications like autonomous vehicles and industrial automation.\n\n\n\n4.2.3 Data Management and Privacy\n\nTraditional ML Workflow: Processes data in centralized locations, often necessitating extensive data transfer and focusing on data security during transit and storage.\nEmbedded AI Workflow: This workflow leverages edge computing to process data closer to its source, reducing data transmission and enhancing privacy through data localization.\n\n\n\n4.2.4 Hardware-Software Integration\n\nTraditional ML Workflow: Typically operates on general-purpose hardware, with software development occurring independently.\nEmbedded AI Workflow: This workflow involves a more integrated approach to hardware and software development, often incorporating custom chips or hardware accelerators to achieve optimal performance.", "crumbs": [ - "MAIN", + "Workflow", "4  AI Workflow" ] }, @@ -385,7 +385,7 @@ "section": "4.3 Roles & Responsibilities", "text": "4.3 Roles & Responsibilities\nCreating an ML solution, especially for embedded AI, is a multidisciplinary effort involving various specialists.\nTable 4.1 shows a rundown of the typical roles involved:\n\n\n\nTable 4.1: Roles and responsibilities of people involved in MLOps.\n\n\n\n\n\n\n\n\n\nRole\nResponsibilities\n\n\n\n\nProject Manager\nOversees the project, ensuring timelines and milestones are met.\n\n\nDomain Experts\nOffer domain-specific insights to define project requirements.\n\n\nData Scientists\nSpecialize in data analysis and model development.\n\n\nMachine Learning Engineers\nFocus on model development and deployment.\n\n\nData Engineers\nManage data pipelines.\n\n\nEmbedded Systems Engineers\nIntegrate ML models into embedded systems.\n\n\nSoftware Developers\nDevelop software components for AI system integration.\n\n\nHardware Engineers\nDesign and optimize hardware for the embedded AI system.\n\n\nUI/UX Designers\nFocus on user-centric design.\n\n\nQA Engineers\nEnsure the system meets quality standards.\n\n\nEthicists and Legal Advisors\nConsult on ethical and legal compliance.\n\n\nOperations and Maintenance Personnel\nMonitor and maintain the deployed system.\n\n\nSecurity Specialists\nEnsure system security.\n\n\n\n\n\n\nUnderstanding these roles is crucial for completing an ML project. As we proceed through the upcoming chapters, we’ll delve into each role’s essence and expertise, fostering a comprehensive understanding of the complexities involved in embedded AI projects. This holistic view facilitates seamless collaboration and nurtures an environment ripe for innovation and breakthroughs.", "crumbs": [ - "MAIN", + "Workflow", "4  AI Workflow" ] }, @@ -394,9 +394,9 @@ "href": "contents/workflow/workflow.html#sec-ai-workflow-resource", "title": "4  AI Workflow", "section": "4.4 Resources", - "text": "4.4 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nML Workflow.\nML Lifecycle.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "4.4 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nML Workflow.\nML Lifecycle.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Workflow", "4  AI Workflow" ] }, @@ -405,9 +405,9 @@ "href": "contents/data_engineering/data_engineering.html", "title": "5  Data Engineering", "section": "", - "text": "5.1 Introduction\nDataset creators face complex privacy and representation challenges when building high-quality training data, especially for sensitive domains like healthcare. Legally, creators may need to remove direct identifiers like names and ages. Even without legal obligations, removing such information can help build user trust. However, excessive anonymization can compromise dataset utility. Techniques like differential privacy\\(^{1}\\), aggregation, and reducing detail provide alternatives to balance privacy and utility but have downsides. Creators must strike a thoughtful balance based on the use case.\nLooking beyond privacy, creators need to proactively assess and address representation gaps that could introduce model biases. It is crucial yet insufficient to ensure diversity across individual variables like gender, race, and accent. Combinations of characteristics also require assessment, as models can only work when certain intersections are present. For example, a medical dataset could have balanced gender, age, and diagnosis data individually, but it lacks enough cases to capture older women with a specific condition. Such higher-order gaps are not immediately obvious but can critically impact model performance.\nCreating useful, ethical training data requires holistic consideration of privacy risks and representation gaps. Perfect solutions are elusive. However, conscientious data engineering practices like anonymization, aggregation, undersampling of overrepresented groups, and synthesized data generation can help balance competing needs. This facilitates models that are both accurate and socially responsible. Cross-functional collaboration and external audits can also strengthen training data. The challenges are multifaceted but surmountable with thoughtful effort.\nWe begin by discussing data collection: Where do we source data, and how do we gather it? Options range from scraping the web, accessing APIs, and utilizing sensors and IoT devices to conducting surveys and gathering user input. These methods reflect real-world practices. Next, we delve into data labeling, including considerations for human involvement. We’ll discuss the tradeoffs and limitations of human labeling and explore emerging methods for automated labeling. Following that, we’ll address data cleaning and preprocessing, a crucial yet frequently undervalued step in preparing raw data for AI model training. Data augmentation comes next, a strategy for enhancing limited datasets by generating synthetic samples. This is particularly pertinent for embedded systems, as many use cases need extensive data repositories readily available for curation. Synthetic data generation emerges as a viable alternative, though it has its own advantages and disadvantages. We’ll also touch upon dataset versioning, emphasizing the importance of tracking data modifications over time. Data is ever-evolving; hence, it’s imperative to devise strategies for managing and storing expansive datasets. By the end of this section, you’ll possess a comprehensive understanding of the entire data pipeline, from collection to storage, essential for operationalizing AI systems. Let’s embark on this journey!", + "text": "5.1 Introduction\nImagine a world where AI can diagnose diseases with unprecedented accuracy, but only if the data used to train it is unbiased and reliable. This is where data engineering comes in. While over 90% of the world’s data has been created in the past two decades, this vast amount of information is only helpful for building effective AI models with proper processing and preparation. Data engineering bridges this gap by transforming raw data into a high-quality format that fuels AI innovation. In today’s data-driven world, protecting user privacy is paramount. Whether mandated by law or driven by user concerns, anonymization techniques like differential privacy and aggregation are vital in mitigating privacy risks. However, careful implementation is crucial to ensure these methods don’t compromise data utility. Dataset creators face complex privacy and representation challenges when building high-quality training data, especially for sensitive domains like healthcare. Legally, creators may need to remove direct identifiers like names and ages. Even without legal obligations, removing such information can help build user trust. However, excessive anonymization can compromise dataset utility. Techniques like differential privacy\\(^{1}\\), aggregation, and reducing detail provide alternatives to balance privacy and utility but have downsides. Creators must strike a thoughtful balance based on the use case.\nWhile privacy is paramount, ensuring fair and robust AI models requires addressing representation gaps in the data. It is crucial yet insufficient to ensure diversity across individual variables like gender, race, and accent. These combinations, sometimes called higher-order gaps, can significantly impact model performance. For example, a medical dataset could have balanced gender, age, and diagnosis data individually, but it lacks enough cases to capture older women with a specific condition. Such higher-order gaps are not immediately obvious but can critically impact model performance.\nCreating useful, ethical training data requires holistic consideration of privacy risks and representation gaps. Elusive perfect solutions necessitate conscientious data engineering practices like anonymization, aggregation, under-sampling of overrepresented groups, and synthesized data generation to balance competing needs. This facilitates models that are both accurate and socially responsible. Cross-functional collaboration and external audits can also strengthen training data. The challenges are multifaceted but surmountable with thoughtful effort.\nWe begin by discussing data collection: Where do we source data, and how do we gather it? Options range from scraping the web, accessing APIs, and utilizing sensors and IoT devices to conducting surveys and gathering user input. These methods reflect real-world practices. Next, we delve into data labeling, including considerations for human involvement. We’ll discuss the trade-offs and limitations of human labeling and explore emerging methods for automated labeling. Following that, we’ll address data cleaning and preprocessing, a crucial yet frequently undervalued step in preparing raw data for AI model training. Data augmentation comes next, a strategy for enhancing limited datasets by generating synthetic samples. This is particularly pertinent for embedded systems, as many use cases need extensive data repositories readily available for curation. Synthetic data generation emerges as a viable alternative with advantages and disadvantages. We’ll also touch upon dataset versioning, emphasizing the importance of tracking data modifications over time. Data is ever-evolving; hence, it’s imperative to devise strategies for managing and storing expansive datasets. By the end of this section, you’ll possess a comprehensive understanding of the entire data pipeline, from collection to storage, essential for operationalizing AI systems. Let’s embark on this journey!", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -418,7 +418,7 @@ "section": "5.2 Problem Definition", "text": "5.2 Problem Definition\nIn many machine learning domains, sophisticated algorithms take center stage, while the fundamental importance of data quality is often overlooked. This neglect gives rise to “Data Cascades” by Sambasivan et al. (2021) (see Figure 5.1)—events where lapses in data quality compound, leading to negative downstream consequences such as flawed predictions, project terminations, and even potential harm to communities. In Figure 5.1, we have an illustration of potential data pitfalls at every stage and how they influence the entire process down the line. The influence of data collection errors is especially pronounced. Any lapses in this stage will become apparent at later stages (in model evaluation and deployment) and might lead to costly consequences, such as abandoning the entire model and restarting anew. Therefore, investing in data engineering techniques from the onset will help us detect errors early.\n\n\n\n\n\n\nFigure 5.1: Data cascades: compounded costs. Credit: Sambasivan et al. (2021).\n\n\nSambasivan, Nithya, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. ““Everyone Wants to Do the Model Work, Not the Data Work”: Data Cascades in High-Stakes AI.” In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–15. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\nDespite many ML professionals recognizing the importance of data, numerous practitioners report facing these cascades. This highlights a systemic issue: while the allure of developing advanced models remains, data often needs to be more appreciated.\nTake, for example, Keyword Spotting (KWS) (see Figure 5.2). KWS is a prime example of TinyML in action and is a critical technology behind voice-enabled interfaces on endpoint devices such as smartphones. Typically functioning as lightweight wake-word engines, these systems are consistently active, listening for a specific phrase to trigger further actions. When we say “OK, Google” or “Alexa,” this initiates a process on a microcontroller embedded within the device. Despite their limited resources, these microcontrollers play an important role in enabling seamless voice interactions with devices, often operating in environments with high ambient noise. The uniqueness of the wake word helps minimize false positives, ensuring that the system is not triggered inadvertently.\nIt is important to appreciate that these keyword-spotting technologies are not isolated; they integrate seamlessly into larger systems, processing signals continuously while managing low power consumption. These systems extend beyond simple keyword recognition, evolving to facilitate diverse sound detections, such as glass breaking. This evolution is geared towards creating intelligent devices capable of understanding and responding to vocal commands, heralding a future where even household appliances can be controlled through voice interactions.\n\n\n\n\n\n\nFigure 5.2: Keyword Spotting example: interacting with Alexa. Credit: Amazon.\n\n\n\nBuilding a reliable KWS model is a complex task. It demands a deep understanding of the deployment scenario, encompassing where and how these devices will operate. For instance, a KWS model’s effectiveness is not just about recognizing a word; it’s about discerning it among various accents and background noises, whether in a bustling cafe or amid the blaring sound of a television in a living room or a kitchen where these devices are commonly found. It’s about ensuring that a whispered “Alexa” in the dead of night or a shouted “OK Google” in a noisy marketplace are recognized with equal precision.\nMoreover, many current KWS voice assistants support a limited number of languages, leaving a substantial portion of the world’s linguistic diversity unrepresented. This limitation is partly due to the difficulty in gathering and monetizing data for languages spoken by smaller populations. The long-tail distribution of languages implies that many languages have limited data, making the development of supportive technologies challenging.\nThis level of accuracy and robustness hinges on the availability and quality of data, the ability to label the data correctly, and the transparency of the data for the end user before it is used to train the model. However, it all begins with clearly understanding the problem statement or definition.\nGenerally, in ML, problem definition has a few key steps:\n\nIdentifying the problem definition clearly\nSetting clear objectives\nEstablishing success benchmark\nUnderstanding end-user engagement/use\nUnderstanding the constraints and limitations of deployment\nFollowed by finally doing the data collection.\n\nA solid project foundation is essential for its trajectory and eventual success. Central to this foundation is first identifying a clear problem, such as ensuring that voice commands in voice assistance systems are recognized consistently across varying environments. Clear objectives, like creating representative datasets for diverse scenarios, provide a unified direction. Benchmarks, such as system accuracy in keyword detection, offer measurable outcomes to gauge progress. Engaging with stakeholders, from end-users to investors, provides invaluable insights and ensures alignment with market needs. Additionally, understanding platform constraints is pivotal when delving into areas like voice assistance. Embedded systems, such as microcontrollers, come with inherent processing power, memory, and energy efficiency limitations. Recognizing these limitations ensures that functionalities, like keyword detection, are tailored to operate optimally, balancing performance with resource conservation.\nIn this context, using KWS as an example, we can break each of the steps out as follows:\n\nIdentifying the Problem: At its core, KWS aims to detect specific keywords amidst ambient sounds and other spoken words. The primary problem is to design a system that can recognize these keywords with high accuracy, low latency, and minimal false positives or negatives, especially when deployed on devices with limited computational resources.\nSetting Clear Objectives: The objectives for a KWS system might include:\n\nAchieving a specific accuracy rate (e.g., 98% accuracy in keyword detection).\nEnsuring low latency (e.g., keyword detection and response within 200 milliseconds).\nMinimizing power consumption to extend battery life on embedded devices.\nEnsuring the model’s size is optimized for the available memory on the device.\n\nBenchmarks for Success: Establish clear metrics to measure the success of the KWS system. This could include:\n\nTrue Positive Rate: The percentage of correctly identified keywords.\nFalse Positive Rate: The percentage of non-keywords incorrectly identified as keywords.\nResponse Time: The time taken from keyword utterance to system response.\nPower Consumption: Average power used during keyword detection.\n\nStakeholder Engagement and Understanding: Engage with stakeholders, which include device manufacturers, hardware and software developers, and end-users. Understand their needs, capabilities, and constraints. For instance:\n\nDevice manufacturers might prioritize low power consumption.\nSoftware developers might emphasize ease of integration.\nEnd-users would prioritize accuracy and responsiveness.\n\nUnderstanding the Constraints and Limitations of Embedded Systems: Embedded devices come with their own set of challenges:\n\nMemory Limitations: KWS models must be lightweight to fit within the memory constraints of embedded devices. Typically, KWS models need to be as small as 16KB to fit in the always-on island of the SoC. Moreover, this is just the model size. Additional application code for preprocessing may also need to fit within the memory constraints.\nProcessing Power: The computational capabilities of embedded devices are limited (a few hundred MHz of clock speed), so the KWS model must be optimized for efficiency.\nPower Consumption: Since many embedded devices are battery-powered, the KWS system must be power-efficient.\nEnvironmental Challenges: Devices might be deployed in various environments, from quiet bedrooms to noisy industrial settings. The KWS system must be robust enough to function effectively across these scenarios.\n\nData Collection and Analysis: For a KWS system, the quality and diversity of data are paramount. Considerations might include:\n\nVariety of Accents: Collect data from speakers with various accents to ensure wide-ranging recognition.\nBackground Noises: Include data samples with different ambient noises to train the model for real-world scenarios.\nKeyword Variations: People might either pronounce keywords differently or have slight variations in the wake word itself. Ensure the dataset captures these nuances.\n\nIterative Feedback and Refinement: Once a prototype KWS system is developed, it’s crucial to test it in real-world scenarios, gather feedback, and iteratively refine the model. This ensures that the system remains aligned with the defined problem and objectives. This is important because the deployment scenarios change over time as things evolve.\n\n\n\n\n\n\n\nExercise 5.1: Keyword Spotting with TensorFlow Lite Micro\n\n\n\n\n\nExplore a hands-on guide for building and deploying Keyword Spotting (KWS) systems using TensorFlow Lite Micro. Follow steps from data collection to model training and deployment to microcontrollers. Learn to create efficient KWS models that recognize specific keywords amidst background noise. Perfect for those interested in machine learning on embedded systems. Unlock the potential of voice-enabled devices with TensorFlow Lite Micro!\n\n\n\n\nThe current chapter underscores the essential role of data quality in ML, using Keyword Spotting (KWS) systems as an example. It outlines key steps, from problem definition to stakeholder engagement, emphasizing iterative feedback. The forthcoming chapter will delve deeper into data quality management, discussing its consequences and future trends, focusing on the importance of high-quality, diverse data in AI system development, addressing ethical considerations and data sourcing methods.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -429,7 +429,7 @@ "section": "5.3 Data Sourcing", "text": "5.3 Data Sourcing\nThe quality and diversity of data gathered are important for developing accurate and robust AI systems. Sourcing high-quality training data requires careful consideration of the objectives, resources, and ethical implications. Data can be obtained from various sources depending on the needs of the project:\n\n5.3.1 Pre-existing datasets\nPlatforms like Kaggle and UCI Machine Learning Repository provide a convenient starting point. Pre-existing datasets are valuable for researchers, developers, and businesses. One of their primary advantages is cost efficiency. Creating a dataset from scratch can be time-consuming and expensive, so accessing ready-made data can save significant resources. Moreover, many datasets, like ImageNet, have become standard benchmarks in the machine learning community, allowing for consistent performance comparisons across different models and algorithms. This data availability means that experiments can be started immediately without any data collection and preprocessing delays. In a fast-moving field like ML, this practicality is important.\nThe quality assurance that comes with popular pre-existing datasets is important to consider because several datasets have errors in them. For instance, the ImageNet dataset was found to have over 6.4% errors. Given their widespread use, the community often identifies and rectifies any errors or biases in these datasets. This assurance is especially beneficial for students and newcomers to the field, as they can focus on learning and experimentation without worrying about data integrity. Supporting documentation often accompanying existing datasets is invaluable, though this generally applies only to widely used datasets. Good documentation provides insights into the data collection process and variable definitions and sometimes even offers baseline model performances. This information not only aids understanding but also promotes reproducibility in research, a cornerstone of scientific integrity; currently, there is a crisis around improving reproducibility in machine learning systems. When other researchers have access to the same data, they can validate findings, test new hypotheses, or apply different methodologies, thus allowing us to build on each other’s work more rapidly.\nWhile platforms like Kaggle and UCI Machine Learning Repository are invaluable resources, it’s essential to understand the context in which the data was collected. Researchers should be wary of potential overfitting when using popular datasets, as multiple models might have been trained on them, leading to inflated performance metrics. Sometimes, these datasets do not reflect the real-world data.\nIn addition, bias, validity, and reproducibility issues may exist in these datasets, and there has been a growing awareness of these issues in recent years. Furthermore, using the same dataset to train different models as shown in Figure 5.3 can sometimes create misalignment: training multiple models using the same dataset resultsi in a ‘misalignment’ between the models and the world, in which an entire ecosystem of models reflects only a narrow subset of the real-world data.\n\n\n\n\n\n\nFigure 5.3: Training different models on the same dataset. Credit: (icons from left to right: Becris; Freepik; Freepik; Paul J; SBTS2018).\n\n\n\n\n\n5.3.2 Web Scraping\nWeb scraping refers to automated techniques for extracting data from websites. It typically involves sending HTTP requests to web servers, retrieving HTML content, and parsing that content to extract relevant information. Popular tools and frameworks for web scraping include Beautiful Soup, Scrapy, and Selenium. These tools offer different functionalities, from parsing HTML content to automating web browser interactions, especially for websites that load content dynamically using JavaScript.\nWeb scraping can effectively gather large datasets for training machine learning models, particularly when human-labeled data is scarce. For computer vision research, web scraping enables the collection of massive volumes of images and videos. Researchers have used this technique to build influential datasets like ImageNet and OpenImages. For example, one could scrape e-commerce sites to amass product photos for object recognition or social media platforms to collect user uploads for facial analysis. Even before ImageNet, Stanford’s LabelMe project scraped Flickr for over 63,000 annotated images covering hundreds of object categories.\nBeyond computer vision, web scraping supports gathering textual data for natural language tasks. Researchers can scrape news sites for sentiment analysis data, forums and review sites for dialogue systems research, or social media for topic modeling. For example, the training data for chatbot ChatGPT was obtained by scraping much of the public Internet. GitHub repositories were scraped to train GitHub’s Copilot AI coding assistant.\nWeb scraping can also collect structured data, such as stock prices, weather data, or product information, for analytical applications. Once data is scraped, it is essential to store it in a structured manner, often using databases or data warehouses. Proper data management ensures the usability of the scraped data for future analysis and applications.\nHowever, while web scraping offers numerous advantages, there are significant limitations and ethical considerations to bear. Not all websites permit scraping, and violating these restrictions can lead to legal repercussions. Scraping copyrighted material or private communications is also unethical and potentially illegal. Ethical web scraping mandates adherence to a website’s ‘robots.txt’ file, which outlines the sections of the site that can be accessed and scraped by automated bots.\nTo deter automated scraping, many websites implement rate limits. If a bot sends too many requests in a short period, it might be temporarily blocked, restricting the speed of data access. Additionally, the dynamic nature of web content means that data scraped at different intervals might need more consistency, posing challenges for longitudinal studies. However, there are emerging trends like Web Navigation where machine learning algorithms can automatically navigate the website to access the dynamic content.\nThe volume of pertinent data available for scraping might be limited for niche subjects. For example, while scraping for common topics like images of cats and dogs might yield abundant data, searching for rare medical conditions might be less fruitful. Moreover, the data obtained through scraping is often unstructured and noisy, necessitating thorough preprocessing and cleaning. It is crucial to understand that not all scraped data will be of high quality or accuracy. Employing verification methods, such as cross-referencing with alternate data sources, can enhance data reliability.\nPrivacy concerns arise when scraping personal data, emphasizing the need for anonymization. Therefore, it is paramount to adhere to a website’s Terms of Service, confine data collection to public domains, and ensure the anonymity of any personal data acquired.\nWhile web scraping can be a scalable method to amass large training datasets for AI systems, its applicability is confined to specific data types. For example, web scraping makes sourcing data for Inertial Measurement Units (IMU) for gesture recognition more complex. At most, one can scrape an existing dataset.\nWeb scraping can yield inconsistent or inaccurate data. For example, the photo in Figure 5.4 shows up when you search for ‘traffic light’ on Google Images. It is an image from 1914 that shows outdated traffic lights, which are also barely discernible because of the image’s poor quality. This can be problematic for web-scraped datasets, as it pollutes the dataset with inapplicable (old) data samples.\n\n\n\n\n\n\nFigure 5.4: A picture of old traffic lights (1914). Credit: Vox.\n\n\n\n\n\n\n\n\n\nExercise 5.2: Web Scraping\n\n\n\n\n\nDiscover the power of web scraping with Python using libraries like Beautiful Soup and Pandas. This exercise will scrape Python documentation for function names and descriptions and explore NBA player stats. By the end, you’ll have the skills to extract and analyze data from real-world websites. Ready to dive in? Access the Google Colab notebook below and start practicing!\n\n\n\n\n\n\n5.3.3 Crowdsourcing\nCrowdsourcing for datasets is the practice of obtaining data using the services of many people, either from a specific community or the general public, typically via the Internet. Instead of relying on a small team or specific organization to collect or label data, crowdsourcing leverages the collective effort of a vast, distributed group of participants. Services like Amazon Mechanical Turk enable the distribution of annotation tasks to a large, diverse workforce. This facilitates the collection of labels for complex tasks like sentiment analysis or image recognition requiring human judgment.\nCrowdsourcing has emerged as an effective approach for data collection and problem-solving. One major advantage of crowdsourcing is scalability—by distributing tasks to a large, global pool of contributors on digital platforms, projects can process huge volumes of data quickly. This makes crowdsourcing ideal for large-scale data labeling, collection, and analysis.\nIn addition, crowdsourcing taps into a diverse group of participants, bringing a wide range of perspectives, cultural insights, and language abilities that can enrich data and enhance creative problem-solving in ways that a more homogenous group may not. Because crowdsourcing draws from a large audience beyond traditional channels, it is more cost-effective than conventional methods, especially for simpler microtasks.\nCrowdsourcing platforms also allow for great flexibility, as task parameters can be adjusted in real time based on initial results. This creates a feedback loop for iterative improvements to the data collection process. Complex jobs can be broken down into microtasks and distributed to multiple people, with results cross-validated by assigning redundant versions of the same task. When thoughtfully managed, crowdsourcing enables community engagement around a collaborative project, where participants find reward in contributing.\nHowever, while crowdsourcing offers numerous advantages, it’s essential to approach it with a clear strategy. While it provides access to a diverse set of annotators, it also introduces variability in the quality of annotations. Additionally, platforms like Mechanical Turk might not always capture a complete demographic spectrum; often, tech-savvy individuals are overrepresented, while children and older people may be underrepresented. Providing clear instructions and training for the annotators is crucial. Periodic checks and validations of the labeled data help maintain quality. This ties back to the topic of clear Problem Definition that we discussed earlier. Crowdsourcing for datasets also requires careful attention to ethical considerations. It’s crucial to ensure that participants are informed about how their data will be used and that their privacy is protected. Quality control through detailed protocols, transparency in sourcing, and auditing is essential to ensure reliable outcomes.\nFor TinyML, crowdsourcing can pose some unique challenges. TinyML devices are highly specialized for particular tasks within tight constraints. As a result, the data they require tends to be very specific. Obtaining such specialized data from a general audience may be difficult through crowdsourcing. For example, TinyML applications often rely on data collected from certain sensors or hardware. Crowdsourcing would require participants to have access to very specific and consistent devices - like microphones, with the same sampling rates. These hardware nuances present obstacles even for simple audio tasks like keyword spotting.\nBeyond hardware, the data itself needs high granularity and quality, given the limitations of TinyML. It can be hard to ensure this when crowdsourcing from those unfamiliar with the application’s context and requirements. There are also potential issues around privacy, real-time collection, standardization, and technical expertise. Moreover, the narrow nature of many TinyML tasks makes accurate data labeling easier with the proper understanding. Participants may need full context to provide reliable annotations.\nThus, while crowdsourcing can work well in many cases, the specialized needs of TinyML introduce unique data challenges. Careful planning is required for guidelines, targeting, and quality control. For some applications, crowdsourcing may be feasible, but others may require more focused data collection efforts to obtain relevant, high-quality training data.\n\n\n5.3.4 Synthetic Data\nSynthetic data generation can be useful for addressing some of the data collection limitations. It involves creating data that wasn’t originally captured or observed but is generated using algorithms, simulations, or other techniques to resemble real-world data. As shown in Figure 5.5, synthetic data is merged with historical data and then used as input for model training. It has become a valuable tool in various fields, particularly when real-world data is scarce, expensive, or ethically challenging (e.g., TinyML). Various techniques, such as Generative Adversarial Networks (GANs), can produce high-quality synthetic data almost indistinguishable from real data. These techniques have advanced significantly, making synthetic data generation increasingly realistic and reliable.\nMore real-world data may need to be available for analysis or training machine learning models in many domains, especially emerging ones. Synthetic data can fill this gap by producing large volumes of data that mimic real-world scenarios. For instance, detecting the sound of breaking glass might be challenging in security applications where a TinyML device is trying to identify break-ins. Collecting real-world data would require breaking numerous windows, which is impractical and costly.\nMoreover, having a diverse dataset is crucial in machine learning, especially in deep learning. Synthetic data can augment existing datasets by introducing variations, thereby enhancing the robustness of models. For example, SpecAugment is an excellent data augmentation technique for Automatic Speech Recognition (ASR) systems.\nPrivacy and confidentiality are also big issues. Datasets containing sensitive or personal information pose privacy concerns when shared or used. Synthetic data, being artificially generated, doesn’t have these direct ties to real individuals, allowing for safer use while preserving essential statistical properties.\nGenerating synthetic data, especially once the generation mechanisms have been established, can be a more cost-effective alternative. Synthetic data eliminates the need to break multiple windows to gather relevant data in the security above application scenario.\nMany embedded use cases deal with unique situations, such as manufacturing plants, that are difficult to simulate. Synthetic data allows researchers complete control over the data generation process, enabling the creation of specific scenarios or conditions that are challenging to capture in real life.\nWhile synthetic data offers numerous advantages, it is essential to use it judiciously. Care must be taken to ensure that the generated data accurately represents the underlying real-world distributions and does not introduce unintended biases.\n\n\n\n\n\n\nFigure 5.5: Increasing training data size with synthetic data generation. Credit: AnyLogic.\n\n\n\n\n\n\n\n\n\nExercise 5.3: Synthetic Data\n\n\n\n\n\nLet us learn about synthetic data generation using Generative Adversarial Networks (GANs) on tabular data. We’ll take a hands-on approach, diving into the workings of the CTGAN model and applying it to the Synthea dataset from the healthcare domain. From data preprocessing to model training and evaluation, we’ll go step-by-step, learning how to create synthetic data, assess its quality, and unlock the potential of GANs for data augmentation and real-world applications.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -440,7 +440,7 @@ "section": "5.4 Data Storage", "text": "5.4 Data Storage\nData sourcing and data storage go hand in hand, and data must be stored in a format that facilitates easy access and processing. Depending on the use case, various kinds of data storage systems can be used to store your datasets. Some examples are shown in Table 5.1.\n\n\n\nTable 5.1: Comparative overview of the database, data warehouse, and data lake.\n\n\n\n\n\n\n\n\n\n\nDatabase\nData Warehouse\nData Lake\n\n\n\n\nPurpose\nOperational and transactional\nAnalytical\n\n\nData type\nStructured\nStructured, semi-structured, and/or unstructured\n\n\nScale\nSmall to large volumes of data\nLarge volumes of integrated data\n\n\nExamples\nMySQL\nGoogle BigQuery, Amazon Redshift, Microsoft Azure Synapse, Google Cloud Storage, AWS S3, Azure Data Lake Storage\n\n\n\n\n\n\nThe stored data is often accompanied by metadata, defined as ’data about data .’It provides detailed contextual information about the data, such as means of data creation, time of creation, attached data use license, etc. For example, Hugging Face has Dataset Cards. To promote responsible data use, dataset creators should disclose potential biases through the dataset cards. These cards can educate users about a dataset’s contents and limitations. The cards also give vital context on appropriate dataset usage by highlighting biases and other important details. Having this type of metadata can also allow fast retrieval if structured properly. Once the model is developed and deployed to edge devices, the storage systems can continue to store incoming data, model updates, or analytical results.\nData Governance: With a large amount of data storage, it is also imperative to have policies and practices (i.e., data governance) that help manage data during its life cycle, from acquisition to disposal. Data governance frames how data is managed and includes making pivotal decisions about data access and control. Figure 5.6 illustrates the different domains involved in data governance. It involves exercising authority and making decisions concerning data to uphold its quality, ensure compliance, maintain security, and derive value. Data governance is operationalized by developing policies, incentives, and penalties, cultivating a culture that perceives data as a valuable asset. Specific procedures and assigned authorities are implemented to safeguard data quality and monitor its utilization and related risks.\nData governance utilizes three integrative approaches: planning and control, organizational, and risk-based.\n\nThe planning and control approach, common in IT, aligns business and technology through annual cycles and continuous adjustments, focusing on policy-driven, auditable governance.\nThe organizational approach emphasizes structure, establishing authoritative roles like Chief Data Officers and ensuring responsibility and accountability in governance.\nThe risk-based approach, intensified by AI advancements, focuses on identifying and managing inherent risks in data and algorithms. It especially addresses AI-specific issues through regular assessments and proactive risk management strategies, allowing for incidental and preventive actions to mitigate undesired algorithm impacts.\n\n\n\n\n\n\n\nFigure 5.6: An overview of the data governance framework. Credit: StarCIO..\n\n\n\nSome examples of data governance across different sectors include:\n\nMedicine: Health Information Exchanges(HIEs) enable the sharing of health information across different healthcare providers to improve patient care. They implement strict data governance practices to maintain data accuracy, integrity, privacy, and security, complying with regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Governance policies ensure that patient data is only shared with authorized entities and that patients can control access to their information.\nFinance: Basel III Framework is an international regulatory framework for banks. It ensures that banks establish clear policies, practices, and responsibilities for data management, ensuring data accuracy, completeness, and timeliness. Not only does it enable banks to meet regulatory compliance, but it also prevents financial crises by more effectively managing risks.\nGovernment: Government agencies managing citizen data, public records, and administrative information implement data governance to manage data transparently and securely. The Social Security System in the US and the Aadhar system in India are good examples of such governance systems.\n\nSpecial data storage considerations for TinyML\nEfficient Audio Storage Formats: Keyword spotting systems need specialized audio storage formats to enable quick keyword searching in audio data. Traditional formats like WAV and MP3 store full audio waveforms, which require extensive processing to search through. Keyword spotting uses compressed storage optimized for snippet-based search. One approach is to store compact acoustic features instead of raw audio. Such a workflow would involve:\n\nExtracting acoustic features: Mel-frequency cepstral coefficients (MFCCs) commonly represent important audio characteristics.\nCreating Embeddings: Embeddings transform extracted acoustic features into continuous vector spaces, enabling more compact and representative data storage. This representation is essential in converting high-dimensional data, like audio, into a more manageable and efficient format for computation and storage.\nVector quantization: This technique represents high-dimensional data, like embeddings, with lower-dimensional vectors, reducing storage needs. Initially, a codebook is generated from the training data to define a set of code vectors representing the original data vectors. Subsequently, each data vector is matched to the nearest codeword according to the codebook, ensuring minimal information loss.\nSequential storage: The audio is fragmented into short frames, and the quantized features (or embeddings) for each frame are stored sequentially to maintain the temporal order, preserving the coherence and context of the audio data.\n\nThis format enables decoding the features frame-by-frame for keyword matching. Searching the features is faster than decompressing the full audio.\nSelective Network Output Storage: Another technique for reducing storage is to discard the intermediate audio features stored during training but not required during inference. The network is run on full audio during training. However, only the final outputs are stored during inference.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -451,7 +451,7 @@ "section": "5.5 Data Processing", "text": "5.5 Data Processing\nData processing refers to the steps involved in transforming raw data into a format suitable for feeding into machine learning algorithms. It is a crucial stage in any ML workflow, yet often overlooked. With proper data processing, ML models are likely to achieve optimal performance. Figure 5.7 shows a breakdown of a data scientist’s time allocation, highlighting the significant portion spent on data cleaning and organizing (%60).\n\n\n\n\n\n\nFigure 5.7: Data scientists’ tasks breakdown by time spent. Credit: Forbes.\n\n\n\nProper data cleaning is a crucial step that directly impacts model performance. Real-world data is often dirty, containing errors, missing values, noise, anomalies, and inconsistencies. Data cleaning involves detecting and fixing these issues to prepare high-quality data for modeling. By carefully selecting appropriate techniques, data scientists can improve model accuracy, reduce overfitting, and enable algorithms to learn more robust patterns. Overall, thoughtful data processing allows machine learning systems to uncover insights better and make predictions from real-world data.\nData often comes from diverse sources and can be unstructured or semi-structured. Thus, processing and standardizing it is essential, ensuring it adheres to a uniform format. Such transformations may include:\n\nNormalizing numerical variables\nEncoding categorical variables\nUsing techniques like dimensionality reduction\n\nData validation serves a broader role than ensuring adherence to certain standards, like preventing temperature values from falling below absolute zero. These issues arise in TinyML because sensors may malfunction or temporarily produce incorrect readings; such transients are not uncommon. Therefore, it is imperative to catch data errors early before propagating through the data pipeline. Rigorous validation processes, including verifying the initial annotation practices, detecting outliers, and handling missing values through techniques like mean imputation, contribute directly to the quality of datasets. This, in turn, impacts the performance, fairness, and safety of the models trained on them. Let’s take a look at Figure 5.8 for an example of a data processing pipeline. In the context of TinyML, the Multilingual Spoken Words Corpus (MSWC) is an example of data processing pipelines—systematic and automated workflows for data transformation, storage, and processing. The input data (which’s a collection of short recordings) goes through sevreral phases of processing, such as audio-word alignemnt and keyword extraction. By streamlining the data flow, from raw data to usable datasets, data pipelines enhance productivity and facilitate the rapid development of machine learning models. The MSWC is an expansive and expanding collection of audio recordings of spoken words in 50 different languages, which are collectively used by over 5 billion people. This dataset is intended for academic study and business uses in areas like keyword identification and speech-based search. It is openly licensed under Creative Commons Attribution 4.0 for broad usage.\n\n\n\n\n\n\nFigure 5.8: An overview of the Multilingual Spoken Words Corpus (MSWC) data processing pipeline. Credit: Mazumder et al. (2021).\n\n\nMazumder, Mark, Sharad Chitlangia, Colby Banbury, Yiping Kang, Juan Manuel Ciro, Keith Achorn, Daniel Galvez, et al. 2021. “Multilingual Spoken Words Corpus.” In Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).\n\n\nThe MSWC used a forced alignment method to automatically extract individual word recordings to train keyword-spotting models from the Common Voice project, which features crowdsourced sentence-level recordings. Forced alignment refers to long-standing methods in speech processing that predict when speech phenomena like syllables, words, or sentences start and end within an audio recording. In the MSWC data, crowdsourced recordings often feature background noises, such as static and wind. Depending on the model’s requirements, these noises can be removed or intentionally retained.\nMaintaining the integrity of the data infrastructure is a continuous endeavor. This encompasses data storage, security, error handling, and stringent version control. Periodic updates are crucial, especially in dynamic realms like keyword spotting, to adjust to evolving linguistic trends and device integrations.\nThere is a boom in data processing pipelines, commonly found in ML operations toolchains, which we will discuss in the MLOps chapter. Briefly, these include frameworks like MLOps by Google Cloud. It provides methods for automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment, and infrastructure management. Several mechanisms focus on data processing, an integral part of these systems.\n\n\n\n\n\n\nExercise 5.4: Data Processing\n\n\n\n\n\nLet us explore two significant projects in speech data processing and machine learning. The MSWC is a vast audio dataset with over 340,000 keywords and 23.4 million 1-second spoken examples. It’s used in various applications like voice-enabled devices and call center automation. The Few-Shot Keyword Spotting project introduces a new approach for keyword spotting across different languages, achieving impressive results with minimal training data. We’ll delve into the MSWC dataset, learn how to structure it effectively, and then train a few-shot keyword-spotting model. Let’s get started!", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -462,7 +462,7 @@ "section": "5.6 Data Labeling", "text": "5.6 Data Labeling\nData labeling is important in creating high-quality training datasets for machine learning models. Labels provide ground truth information, allowing models to learn relationships between inputs and desired outputs. This section covers key considerations for selecting label types, formats, and content to capture the necessary information for tasks. It discusses common annotation approaches, from manual labeling to crowdsourcing to AI-assisted methods, and best practices for ensuring label quality through training, guidelines, and quality checks. We also emphasize the ethical treatment of human annotators. The integration of AI to accelerate and augment human annotation is also explored. Understanding labeling needs, challenges, and strategies are essential for constructing reliable, useful datasets to train performant, trustworthy machine learning systems.\n\n5.6.1 Label Types\nLabels capture information about key tasks or concepts. Figure 5.9 includes some common label types: a “classification label” is used for categorizing images with labels (labeling an image with “dog” if it features a dog); a “bounding box” identifies object location (drawing a box around the dog); a “segmentation map” classifies objects at the pixel level (highlighting the dog in a distinct color); a “caption” provides descriptive annotations (describing the dog’s actions, position, color, etc.); and a “transcript” denotes audio content. The choice of label format depends on the use case and resource constraints, as more detailed labels require greater effort to collect (Johnson-Roberson et al. (2017)).\n\nJohnson-Roberson, Matthew, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, and Ram Vasudevan. 2017. “Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?” In 2017 IEEE International Conference on Robotics and Automation (ICRA), 746–53. Singapore, Singapore: IEEE. https://doi.org/10.1109/icra.2017.7989092.\n\n\n\n\n\n\nFigure 5.9: An overview of common label types.\n\n\n\nUnless focused on self-supervised learning, a dataset will likely provide labels addressing one or more tasks of interest. Given their unique resource constraints, dataset creators must consider what information labels should capture and how they can practically obtain the necessary labels. Creators must first decide what type(s) of content labels should capture. For example, a creator interested in car detection would want to label cars in their dataset. Still, they might also consider whether to simultaneously collect labels for other tasks that the dataset could potentially be used for, such as pedestrian detection.\nAdditionally, annotators can provide metadata that provides insight into how the dataset represents different characteristics of interest (see Section 5.9). The Common Voice dataset, for example, includes various types of metadata that provide information about the speakers, recordings, and dataset quality for each language represented (Ardila et al. (2020)). They include demographic splits showing the number of recordings by speaker age range and gender. This allows us to see who contributed recordings for each language. They also include statistics like average recording duration and total hours of validated recordings. These give insights into the nature and size of the datasets for each language. Additionally, quality control metrics like the percentage of recordings that have been validated are useful to know how complete and clean the datasets are. The metadata also includes normalized demographic splits scaled to 100% for comparison across languages. This highlights representation differences between higher and lower resource languages.\n\nArdila, Rosana, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor Weber. 2020. “Common Voice: A Massively-Multilingual Speech Corpus.” In Proceedings of the Twelfth Language Resources and Evaluation Conference, 4218–22. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.520.\nNext, creators must determine the format of those labels. For example, a creator interested in car detection might choose between binary classification labels that say whether a car is present, bounding boxes that show the general locations of any cars, or pixel-wise segmentation labels that show the exact location of each car. Their choice of label format may depend on their use case and resource constraints, as finer-grained labels are typically more expensive and time-consuming to acquire.\n\n\n5.6.2 Annotation Methods\nCommon annotation approaches include manual labeling, crowdsourcing, and semi-automated techniques. Manual labeling by experts yields high quality but needs more scalability. Crowdsourcing enables non-experts to distribute annotation, often through dedicated platforms (Sheng and Zhang (2019)). Weakly supervised and programmatic methods can reduce manual effort by heuristically or automatically generating labels (Ratner et al. (2018)).\n\nSheng, Victor S., and Jing Zhang. 2019. “Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions.” Proceedings of the AAAI Conference on Artificial Intelligence 33 (01): 9837–43. https://doi.org/10.1609/aaai.v33i01.33019837.\n\nRatner, Alex, Braden Hancock, Jared Dunnmon, Roger Goldman, and Christopher Ré. 2018. “Snorkel MeTaL: Weak Supervision for Multi-Task Learning.” In Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. ACM. https://doi.org/10.1145/3209889.3209898.\nAfter deciding on their labels’ desired content and format, creators begin the annotation process. To collect large numbers of labels from human annotators, creators frequently rely on dedicated annotation platforms, which can connect them to teams of human annotators. When using these platforms, creators may need more insight into annotators’ backgrounds and experience levels with topics of interest. However, some platforms offer access to annotators with specific expertise (e.g., doctors).\n\n\n5.6.3 Ensuring Label Quality\nThere is no guarantee that the data labels are actually correct. Figure 5.10 shows some examples of hard labeling cases: some errors arise from blurred pictures that make them hard to identify (the frog image), and others stem from a lack of domain knowledge (the black stork case). It is possible that despite the best instructions being given to labelers, they still mislabel some images (Northcutt, Athalye, and Mueller (2021)). Strategies like quality checks, training annotators, and collecting multiple labels per datapoint can help ensure label quality. For ambiguous tasks, multiple annotators can help identify controversial datapoints and quantify disagreement levels.\n\n\n\n\n\n\nFigure 5.10: Some examples of hard labeling cases. Credit: Northcutt, Athalye, and Mueller (2021).\n\n\nNorthcutt, Curtis G, Anish Athalye, and Jonas Mueller. 2021. “Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks.” arXiv. https://doi.org/https://doi.org/10.48550/arXiv.2103.14749 arXiv-issued DOI via DataCite.\n\n\nWhen working with human annotators, offering fair compensation and otherwise prioritizing ethical treatment is important, as annotators can be exploited or otherwise harmed during the labeling process (Perrigo, 2023). For example, if a dataset is likely to contain disturbing content, annotators may benefit from having the option to view images in grayscale (Google (n.d.)).\n\nGoogle. n.d. “Information Quality Content Moderation.” https://blog.google/documents/83/.\n\n\n5.6.4 AI-Assisted Annotation\nML has an insatiable demand for data. Therefore, more data is needed. This raises the question of how we can get more labeled data. Rather than always generating and curating data manually, we can rely on existing AI models to help label datasets more quickly and cheaply, though often with lower quality than human annotation. This can be done in various ways as shown in Figure 5.11, including the following:\n\nPre-annotation: AI models can generate preliminary labels for a dataset using methods such as semi-supervised learning (Chapelle, Scholkopf, and Zien (2009)), which humans can then review and correct. This can save a significant amount of time, especially for large datasets.\nActive learning: AI models can identify the most informative data points in a dataset, which can then be prioritized for human annotation. This can help improve the labeled dataset’s quality while reducing the overall annotation time.\nQuality control: AI models can identify and flag potential errors in human annotations, helping to ensure the accuracy and consistency of the labeled dataset.\n\n\nChapelle, O., B. Scholkopf, and A. Zien Eds. 2009. “Semi-Supervised Learning (Chapelle, O. Et Al., Eds.; 2006) [Book Reviews].” IEEE Trans. Neural Networks 20 (3): 542–42. https://doi.org/10.1109/tnn.2009.2015974.\nHere are some examples of how AI-assisted annotation has been proposed to be useful:\n\nMedical imaging: AI-assisted annotation labels medical images, such as MRI scans and X-rays (Krishnan, Rajpurkar, and Topol (2022)). Carefully annotating medical datasets is extremely challenging, especially at scale, since domain experts are scarce and become costly. This can help to train AI models to diagnose diseases and other medical conditions more accurately and efficiently.\n\nSelf-driving cars: AI-assisted annotation is being used to label images and videos from self-driving cars. This can help to train AI models to identify objects on the road, such as other vehicles, pedestrians, and traffic signs.\nSocial media: AI-assisted annotation labels social media posts like images and videos. This can help to train AI models to identify and classify different types of content, such as news, advertising, and personal posts.\n\n\nKrishnan, Rayan, Pranav Rajpurkar, and Eric J. Topol. 2022. “Self-Supervised Learning in Medicine and Healthcare.” Nat. Biomed. Eng. 6 (12): 1346–52. https://doi.org/10.1038/s41551-022-00914-1.\n\n\n\n\n\n\nFigure 5.11: Strategies for acquiring additional labeled training data. Credit: Standford AI Lab.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -473,7 +473,7 @@ "section": "5.7 Data Version Control", "text": "5.7 Data Version Control\nProduction systems are perpetually inundated with fluctuating and escalating volumes of data, prompting the rapid emergence of numerous data replicas. This increasing data serves as the foundation for training machine learning models. For instance, a global sales company engaged in sales forecasting continuously receives consumer behavior data. Similarly, healthcare systems formulating predictive models for disease diagnosis are consistently acquiring new patient data. TinyML applications, such as keyword spotting, are highly data-hungry regarding the amount of data generated. Consequently, meticulous tracking of data versions and the corresponding model performance is imperative.\nData Version Control offers a structured methodology to handle alterations and versions of datasets efficiently. It facilitates monitoring modifications, preserves multiple versions, and guarantees reproducibility and traceability in data-centric projects. Furthermore, data version control provides the versatility to review and utilize specific versions as needed, ensuring that each stage of the data processing and model development can be revisited and audited precisely and easily. It has a variety of practical uses -\nRisk Management: Data version control allows transparency and accountability by tracking dataset versions.\nCollaboration and Efficiency: Easy access to different dataset versions in one place can improve data sharing of specific checkpoints and enable efficient collaboration.\nReproducibility: Data version control allows for tracking the performance of models concerning different versions of the data, and therefore enabling reproducibility.\nKey Concepts\n\nCommits: It is an immutable snapshot of the data at a specific point in time, representing a unique version. Every commit is associated with a unique identifier to allow\nBranches: Branching allows developers and data scientists to diverge from the main development line and continue to work independently without affecting other branches. This is especially useful when experimenting with new features or models, enabling parallel development and experimentation without the risk of corrupting the stable main branch.\nMerges: Merges help to integrate changes from different branches while maintaining the integrity of the data.\n\nWith data version control in place, we can track the changes shown in Figure 5.12, reproduce previous results by reverting to older versions, and collaborate safely by branching off and isolating the changes.\n\n\n\n\n\n\nFigure 5.12: Data versioning.\n\n\n\nPopular Data Version Control Systems\nDVC: It stands for Data Version Control in short and is an open-source, lightweight tool that works on top of Git Hub and supports all kinds of data formats. It can seamlessly integrate into the workflow if Git is used to manage code. It captures the versions of data and models in the Git commits while storing them on-premises or on the cloud (e.g., AWS, Google Cloud, Azure). These data and models (e.g., ML artifacts) are defined in the metadata files, which get updated in every commit. It can allow metrics tracking of models on different versions of the data.\nlakeFS: It is an open-source tool that supports the data version control on data lakes. It supports many git-like operations, such as branching and merging of data, as well as reverting to previous versions of the data. It also has a unique UI feature, making exploring and managing data much easier.\nGit LFS: It is useful for data version control on smaller-sized datasets. It uses Git’s inbuilt branching and merging features but is limited in tracking metrics, reverting to previous versions, or integrating with data lakes.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -484,7 +484,7 @@ "section": "5.8 Optimizing Data for Embedded AI", "text": "5.8 Optimizing Data for Embedded AI\nCreators working on embedded systems may have unusual priorities when cleaning their datasets. On the one hand, models may be developed for unusually specific use cases, requiring heavy filtering of datasets. While other natural language models may be capable of turning any speech into text, a model for an embedded system may be focused on a single limited task, such as detecting a keyword. As a result, creators may aggressively filter out large amounts of data because they need to address the task of interest. An embedded AI system may also be tied to specific hardware devices or environments. For example, a video model may need to process images from a single type of camera, which will only be mounted on doorbells in residential neighborhoods. In this scenario, creators may discard images if they came from a different kind of camera, show the wrong type of scenery, or were taken from the wrong height or angle.\nOn the other hand, embedded AI systems are often expected to provide especially accurate performance in unpredictable real-world settings. This may lead creators to design datasets to represent variations in potential inputs and promote model robustness. As a result, they may define a narrow scope for their project but then aim for deep coverage within those bounds. For example, creators of the doorbell model mentioned above might try to cover variations in data arising from:\n\nGeographically, socially, and architecturally diverse neighborhoods\nDifferent types of artificial and natural lighting\nDifferent seasons and weather conditions\nObstructions (e.g., raindrops or delivery boxes obscuring the camera’s view)\n\nAs described above, creators may consider crowdsourcing or synthetically generating data to include these variations.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -495,7 +495,7 @@ "section": "5.9 Data Transparency", "text": "5.9 Data Transparency\nBy providing clear, detailed documentation, creators can help developers understand how best to use their datasets. Several groups have suggested standardized documentation formats for datasets, such as Data Cards (Pushkarna, Zaldivar, and Kjartansson (2022)), datasheets (Gebru et al. (2021)), data statements (Bender and Friedman (2018)), or Data Nutrition Labels (Holland et al. (2020)). When releasing a dataset, creators may describe what kinds of data they collected, how they collected and labeled it, and what kinds of use cases may be a good or poor fit for the dataset. Quantitatively, it may be appropriate to show how well the dataset represents different groups (e.g., different gender groups, different cameras).\n\nGebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. “Datasheets for Datasets.” Commun. ACM 64 (12): 86–92. https://doi.org/10.1145/3458723.\n\nBender, Emily M., and Batya Friedman. 2018. “Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science.” Transactions of the Association for Computational Linguistics 6 (December): 587–604. https://doi.org/10.1162/tacl_a_00041.\n\nHolland, Sarah, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2020. “The Dataset Nutrition Label: A Framework to Drive Higher Data Quality Standards.” In Data Protection and Privacy. Hart Publishing. https://doi.org/10.5040/9781509932771.ch-001.\nFigure 5.13 shows an example of a data card for a computer vision (CV) dataset. It includes some basic information about the dataset and instructions on how to use it, including known biases.\n\n\n\n\n\n\nFigure 5.13: Data card describing a CV dataset. Credit: Pushkarna, Zaldivar, and Kjartansson (2022).\n\n\nPushkarna, Mahima, Andrew Zaldivar, and Oddur Kjartansson. 2022. “Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI.” In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3531146.3533231.\n\n\nKeeping track of data provenance- essentially the origins and the journey of each data point through the data pipeline- is not merely a good practice but an essential requirement for data quality. Data provenance contributes significantly to the transparency of machine learning systems. Transparent systems make it easier to scrutinize data points, enabling better identification and rectification of errors, biases, or inconsistencies. For instance, if an ML model trained on medical data is underperforming in particular areas, tracing the provenance can help identify whether the issue is with the data collection methods, the demographic groups represented in the data or other factors. This level of transparency doesn’t just help debug the system but also plays a crucial role in enhancing the overall data quality. By improving the reliability and credibility of the dataset, data provenance also enhances the model’s performance and its acceptability among end-users.\nWhen producing documentation, creators should also specify how users can access the dataset and how the dataset will be maintained over time. For example, users may need to undergo training or receive special permission from the creators before accessing a protected information dataset, as with many medical datasets. In some cases, users may not access the data directly. Instead, they must submit their model to be trained on the dataset creators’ hardware, following a federated learning setup (Aledhari et al. (2020)). Creators may also describe how long the dataset will remain accessible, how the users can submit feedback on any errors they discover, and whether there are plans to update the dataset.\n\nAledhari, Mohammed, Rehma Razzak, Reza M. Parizi, and Fahad Saeed. 2020. “Federated Learning: A Survey on Enabling Technologies, Protocols, and Applications.” #IEEE_O_ACC# 8: 140699–725. https://doi.org/10.1109/access.2020.3013541.\nSome laws and regulations also promote data transparency through new requirements for organizations:\n\nGeneral Data Protection Regulation (GDPR) in the European Union: It establishes strict requirements for processing and protecting the personal data of EU citizens. It mandates plain-language privacy policies that clearly explain what data is collected, why it is used, how long it is stored, and with whom it is shared. GDPR also mandates that privacy notices must include details on the legal basis for processing, data transfers, retention periods, rights to access and deletion, and contact info for data controllers.\nCalifornia’s Consumer Privacy Act (CCPA): CCPA requires clear privacy policies and opt-out rights to sell personal data. Significantly, it also establishes rights for consumers to request their specific data be disclosed. Businesses must provide copies of collected personal information and details on what it is used for, what categories are collected, and what third parties receive. Consumers can identify data points they believe need to be more accurate. The law represents a major step forward in empowering personal data access.\n\nEnsured data transparency presents several challenges, especially because it requires significant time and financial resources. Data systems are also quite complex, and full transparency can take time. Full transparency may also overwhelm consumers with too much detail. Finally, it is also important to balance the tradeoff between transparency and privacy.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -506,7 +506,7 @@ "section": "5.10 Licensing", "text": "5.10 Licensing\nMany high-quality datasets either come from proprietary sources or contain copyrighted information. This introduces licensing as a challenging legal domain. Companies eager to train ML systems must engage in negotiations to obtain licenses that grant legal access to these datasets. Furthermore, licensing terms can impose restrictions on data applications and sharing methods. Failure to comply with these licenses can have severe consequences.\nFor instance, ImageNet, one of the most extensively utilized datasets for computer vision research, is a case in point. Most of its images were procured from public online sources without explicit permission, sparking ethical concerns (Prabhu and Birhane, 2020). Accessing the ImageNet dataset for corporations requires registration and adherence to its terms of use, which restricts commercial usage (ImageNet, 2021). Major players like Google and Microsoft invest significantly in licensing datasets to enhance their ML vision systems. However, the cost factor restricts accessibility for researchers from smaller companies with constrained budgets.\nThe legal domain of data licensing has seen major cases that help define fair use parameters. A prominent example is Authors Guild, Inc. v. Google, Inc. This 2005 lawsuit alleged that Google’s book scanning project infringed copyrights by displaying snippets without permission. However, the courts ultimately ruled in Google’s favor, upholding fair use based on the transformative nature of creating a searchable index and showing limited text excerpts. This precedent provides some legal grounds for arguing fair use protections apply to indexing datasets and generating representative samples for machine learning. However, license restrictions remain binding, so a comprehensive analysis of licensing terms is critical. The case demonstrates why negotiations with data providers are important to enable legal usage within acceptable bounds.\nNew Data Regulations and Their Implications\nNew data regulations also impact licensing practices. The legislative landscape is evolving with regulations like the EU’s Artificial Intelligence Act, which is poised to regulate AI system development and use within the European Union (EU). This legislation:\n\nClassifies AI systems by risk.\nMandates development and usage prerequisites.\nEmphasizes data quality, transparency, human oversight, and accountability.\n\nAdditionally, the EU Act addresses the ethical dimensions and operational challenges in sectors such as healthcare and finance. Key elements include the prohibition of AI systems posing \"unacceptable\" risks, stringent conditions for high-risk systems, and minimal obligations for \"limited risk\" AI systems. The proposed European AI Board will oversee and ensure the implementation of efficient regulation.\nChallenges in Assembling ML Training Datasets\nComplex licensing issues around proprietary data, copyright law, and privacy regulations constrain options for assembling ML training datasets. However, expanding accessibility through more open licensing or public-private data collaborations could greatly accelerate industry progress and ethical standards.\nSometimes, certain portions of a dataset may need to be removed or obscured to comply with data usage agreements or protect sensitive information. For example, a dataset of user information may have names, contact details, and other identifying data that may need to be removed from the dataset; this is well after the dataset has already been actively sourced and used for training models. Similarly, a dataset that includes copyrighted content or trade secrets may need to filter out those portions before being distributed. Laws such as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and the Amended Act on the Protection of Personal Information (APPI) have been passed to guarantee the right to be forgotten. These regulations legally require model providers to erase user data upon request.\nData collectors and providers need to be able to take appropriate measures to de-identify or filter out any proprietary, licensed, confidential, or regulated information as needed. Sometimes, the users may explicitly request that their data be removed.\nThe ability to update the dataset by removing data from the dataset will enable the creators to uphold legal and ethical obligations around data usage and privacy. However, the ability to remove data has some important limitations. We must consider that some models may have already been trained on the dataset, and there is no clear or known way to eliminate a particular data sample’s effect from the trained network. There is no erase mechanism. Thus, this begs the question, should the model be retrained from scratch each time a sample is removed? That’s a costly option. Once data has been used to train a model, simply removing it from the original dataset may not fully eliminate its impact on the model’s behavior. New research is needed around the effects of data removal on already-trained models and whether full retraining is necessary to avoid retaining artifacts of deleted data. This presents an important consideration when balancing data licensing obligations with efficiency and practicality in an evolving, deployed ML system.\nDataset licensing is a multifaceted domain that intersects technology, ethics, and law. Understanding these intricacies becomes paramount for anyone building datasets during data engineering as the world evolves.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -517,7 +517,7 @@ "section": "5.11 Conclusion", "text": "5.11 Conclusion\nData is the fundamental building block of AI systems. Without quality data, even the most advanced machine learning algorithms will fail. Data engineering encompasses the end-to-end process of collecting, storing, processing, and managing data to fuel the development of machine learning models. It begins with clearly defining the core problem and objectives, which guides effective data collection. Data can be sourced from diverse means, including existing datasets, web scraping, crowdsourcing, and synthetic data generation. Each approach involves tradeoffs between cost, speed, privacy, and specificity. Once data is collected, thoughtful labeling through manual or AI-assisted annotation enables the creation of high-quality training datasets. Proper storage in databases, warehouses, or lakes facilitates easy access and analysis. Metadata provides contextual details about the data. Data processing transforms raw data into a clean, consistent format for machine learning model development. Throughout this pipeline, transparency through documentation and provenance tracking is crucial for ethics, auditability, and reproducibility. Data licensing protocols also govern legal data access and use. Key challenges in data engineering include privacy risks, representation gaps, legal restrictions around proprietary data, and the need to balance competing constraints like speed versus quality. By thoughtfully engineering high-quality training data, machine learning practitioners can develop accurate, robust, and responsible AI systems, including embedded and TinyML applications.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -526,9 +526,9 @@ "href": "contents/data_engineering/data_engineering.html#sec-data-engineering-resource", "title": "5  Data Engineering", "section": "5.12 Resources", - "text": "5.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nData Engineering: Overview.\nFeature engineering.\nData Standards: Speech Commands.\nCrowdsourcing Data for the Long Tail.\nReusing and Adapting Existing Datasets.\nResponsible Data Collection.\nData Anomaly Detection:\n\nAnamoly Detection: Overview.\nAnamoly Detection: Challenges.\nAnamoly Detection: Datasets.\nAnamoly Detection: using Autoencoders.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 5.1\nExercise 5.2\nExercise 5.3\nExercise 5.4\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "5.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nData Engineering: Overview.\nFeature engineering.\nData Standards: Speech Commands.\nCrowdsourcing Data for the Long Tail.\nReusing and Adapting Existing Datasets.\nResponsible Data Collection.\nData Anomaly Detection:\n\nAnamoly Detection: Overview.\nAnamoly Detection: Challenges.\nAnamoly Detection: Datasets.\nAnamoly Detection: using Autoencoders.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 5.1\nExercise 5.2\nExercise 5.3\nExercise 5.4\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Workflow", "5  Data Engineering" ] }, @@ -539,7 +539,7 @@ "section": "", "text": "6.1 Introduction\nMachine learning frameworks provide the tools and infrastructure to efficiently build, train, and deploy machine learning models. In this chapter, we will explore the evolution and key capabilities of major frameworks like TensorFlow (TF), PyTorch, and specialized frameworks for embedded devices. We will dive into the components like computational graphs, optimization algorithms, hardware acceleration, and more that enable developers to construct performant models quickly. Understanding these frameworks is essential to leverage the power of deep learning across the spectrum from cloud to edge devices.\nML frameworks handle much of the complexity of model development through high-level APIs and domain-specific languages that allow practitioners to quickly construct models by combining pre-made components and abstractions. For example, frameworks like TensorFlow and PyTorch provide Python APIs to define neural network architectures using layers, optimizers, datasets, and more. This enables rapid iteration compared to coding every model detail from scratch.\nA key capability framework offered is distributed training engines that can scale model training across clusters of GPUs and TPUs. This makes it feasible to train state-of-the-art models with billions or trillions of parameters on vast datasets. Frameworks also integrate with specialized hardware like NVIDIA GPUs to further accelerate training via optimizations like parallelization and efficient matrix operations.\nIn addition, frameworks simplify deploying finished models into production through tools like TensorFlow Serving for scalable model serving and TensorFlow Lite for optimization on mobile and edge devices. Other valuable capabilities include visualization, model optimization techniques like quantization and pruning, and monitoring metrics during training.\nThey were leading open-source frameworks like TensorFlow, PyTorch, and MXNet, which power much of AI research and development today. Commercial offerings like Amazon SageMaker and Microsoft Azure Machine Learning integrate these open source frameworks with proprietary capabilities and enterprise tools.\nMachine learning engineers and practitioners leverage these robust frameworks to focus on high-value tasks like model architecture, feature engineering, and hyperparameter tuning instead of infrastructure. The goal is to build and deploy performant models that solve real-world problems efficiently.\nThis chapter, we will explore today’s leading cloud frameworks and how they have adapted models and tools specifically for embedded and edge deployment. We will compare programming models, supported hardware, optimization capabilities, and more to fully understand how frameworks enable scalable machine learning from the cloud to the edge.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -550,7 +550,7 @@ "section": "6.2 Framework Evolution", "text": "6.2 Framework Evolution\nMachine learning frameworks have evolved significantly to meet the diverse needs of machine learning practitioners and advancements in AI techniques. A few decades ago, building and training machine learning models required extensive low-level coding and infrastructure. Machine learning frameworks have evolved considerably over the past decade to meet the expanding needs of practitioners and rapid advances in deep learning techniques. Insufficient data and computing power constrained early neural network research. Building and training machine learning models required extensive low-level coding and infrastructure. However, the release of large datasets like ImageNet (Deng et al. 2009) and advancements in parallel GPU computing unlocked the potential for far deeper neural networks.\n\nDeng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. “ImageNet: A Large-Scale Hierarchical Image Database.” In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–55. IEEE. https://doi.org/10.1109/cvpr.2009.5206848.\n\nTeam, The Theano Development, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, et al. 2016. “Theano: A Python Framework for Fast Computation of Mathematical Expressions.” https://arxiv.org/abs/1605.02688.\n\nJia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. “Caffe: Convolutional Architecture for Fast Feature Embedding.” In Proceedings of the 22nd ACM International Conference on Multimedia, 675–78. ACM. https://doi.org/10.1145/2647868.2654889.\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, 1106–14. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.\n\nChollet, François. 2018. “Introduction to Keras.” March 9th.\n\nTokui, Seiya, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki Yamazaki Vincent. 2019. “Chainer: A Deep Learning Framework for Accelerating the Research Cycle.” In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining, 5:1–6. ACM. https://doi.org/10.1145/3292500.3330756.\n\nSeide, Frank, and Amit Agarwal. 2016. “Cntk: Microsoft’s Open-Source Deep-Learning Toolkit.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2135–35. ACM. https://doi.org/10.1145/2939672.2945397.\n\nAnsel, Jason, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, et al. 2024. “PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation.” In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, edited by Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, 8024–35. ACM. https://doi.org/10.1145/3620665.3640366.\nThe first ML frameworks, Theano by Team et al. (2016) and Caffe by Jia et al. (2014), were developed by academic institutions (Montreal Institute for Learning Algorithms, Berkeley Vision and Learning Center). Amid growing interest in deep learning due to state-of-the-art performance of AlexNet Krizhevsky, Sutskever, and Hinton (2012) on the ImageNet dataset, private companies and individuals began developing ML frameworks, resulting in frameworks such as Keras by Chollet (2018), Chainer by Tokui et al. (2019), TensorFlow from Google (Yu et al. 2018), CNTK by Microsoft (Seide and Agarwal 2016), and PyTorch by Facebook (Ansel et al. 2024).\nMany of these ML frameworks can be divided into high-level vs. low-level frameworks and static vs. dynamic computational graph frameworks. High-level frameworks provide a higher level of abstraction than low-level frameworks. High-level frameworks have pre-built functions and modules for common ML tasks, such as creating, training, and evaluating common ML models, preprocessing data, engineering features, and visualizing data, which low-level frameworks do not have. Thus, high-level frameworks may be easier to use but are less customizable than low-level frameworks (i.e., users of low-level frameworks can define custom layers, loss functions, optimization algorithms, etc.). Examples of high-level frameworks include TensorFlow/Keras and PyTorch. Examples of low-level ML frameworks include TensorFlow with low-level APIs, Theano, Caffe, Chainer, and CNTK.\nFrameworks like Theano and Caffe used static computational graphs, which required rigidly defining the full model architecture upfront. Static graphs require upfront declaration and limit flexibility. Dynamic graphs are constructed on the fly for more iterative development. However, around 2016, frameworks began adopting dynamic graphs like PyTorch and TensorFlow 2.0, which can construct graphs on the fly. This provides greater flexibility for model development. We will discuss these concepts and details later in the AI Training section.\nThe development of these frameworks facilitated an explosion in model size and complexity over time—from early multilayer perceptrons and convolutional networks to modern transformers with billions or trillions of parameters. In 2016, ResNet models by He et al. (2016) achieved record ImageNet accuracy with over 150 layers and 25 million parameters. Then, in 2020, the GPT-3 language model from OpenAI (Brown et al. 2020) pushed parameters to an astonishing 175 billion using model parallelism in frameworks to train across thousands of GPUs and TPUs.\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. IEEE. https://doi.org/10.1109/cvpr.2016.90.\n\nBrown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.\nEach generation of frameworks unlocked new capabilities that powered advancement:\n\nTheano and TensorFlow (2015) introduced computational graphs and automatic differentiation to simplify model building.\nCNTK (2016) pioneered efficient distributed training by combining model and data parallelism.\nPyTorch (2016) provided imperative programming and dynamic graphs for flexible experimentation.\nTensorFlow 2.0 (2019) defaulted eager execution for intuitiveness and debugging.\nTensorFlow Graphics (2020) added 3D data structures to handle point clouds and meshes.\n\nIn recent years, the frameworks have converged. Figure 6.1 shows that TensorFlow and PyTorch have become the overwhelmingly dominant ML frameworks, representing more than 95% of ML frameworks used in research and production. Keras was integrated into TensorFlow in 2019; Preferred Networks transitioned Chainer to PyTorch in 2019; and Microsoft stopped actively developing CNTK in 2022 to support PyTorch on Windows.\n\n\n\n\n\n\nFigure 6.1: Popularity of ML frameworks in the United States as measured by Google web searches. Credit: Google.\n\n\n\nHowever, a one-size-fits-all approach only works well across the spectrum from cloud to tiny edge devices. Different frameworks represent various philosophies around graph execution, declarative versus imperative APIs, and more. Declaratives define what the program should do, while imperatives focus on how it should be done step-by-step. For instance, TensorFlow uses graph execution and declarative-style modeling, while PyTorch adopts eager execution and imperative modeling for more Pythonic flexibility. Each approach carries tradeoffs, which we will discuss later in the Basic Components section.\nToday’s advanced frameworks enable practitioners to develop and deploy increasingly complex models - a key driver of innovation in the AI field. However, they continue to evolve and expand their capabilities for the next generation of machine learning. To understand how these systems continue to evolve, we will dive deeper into TensorFlow as an example of how the framework grew in complexity over time.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -561,7 +561,7 @@ "section": "6.3 DeepDive into TensorFlow", "text": "6.3 DeepDive into TensorFlow\nTensorFlow was developed by the Google Brain team and was released as an open-source software library on November 9, 2015. It was designed for numerical computation using data flow graphs and has since become popular for a wide range of machine learning and deep learning applications.\nTensorFlow is a training and inference framework that provides built-in functionality to handle everything from model creation and training to deployment, as shown in Figure 6.2. Since its initial development, the TensorFlow ecosystem has grown to include many different “varieties” of TensorFlow, each intended to allow users to support ML on different platforms. In this section, we will mainly discuss only the core package.\n\n6.3.1 TF Ecosystem\n\nTensorFlow Core: primary package that most developers engage with. It provides a comprehensive, flexible platform for defining, training, and deploying machine learning models. It includes tf—keras as its high-level API.\nTensorFlow Lite (https://www.tensorflow.org/lite): designed for deploying lightweight models on mobile, embedded, and edge devices. It offers tools to convert TensorFlow models to a more compact format suitable for limited-resource devices and provides optimized pre-trained models for mobile.\nTensorFlow.js: JavaScript library that allows training and deployment of machine learning models directly in the browser or on Node.js. It also provides tools for porting pre-trained TensorFlow models to the browser-friendly format.\nTensorFlow on Edge Devices (Coral): platform of hardware components and software tools from Google that allows the execution of TensorFlow models on edge devices, leveraging Edge TPUs for acceleration.\nTensorFlow Federated (TFF): framework for machine learning and other computations on decentralized data. TFF facilitates federated learning, allowing model training across many devices without centralizing the data.\nTensorFlow Graphics: library for using TensorFlow to carry out graphics-related tasks, including 3D shapes and point clouds processing, using deep learning.\nTensorFlow Hub: repository of reusable machine learning model components to allow developers to reuse pre-trained model components, facilitating transfer learning and model composition\nTensorFlow Serving: framework designed for serving and deploying machine learning models for inference in production environments. It provides tools for versioning and dynamically updating deployed models without service interruption.\nTensorFlow Extended (TFX): end-to-end platform designed to deploy and manage machine learning pipelines in production settings. TFX encompasses data validation, preprocessing, model training, validation, and serving components.\n\n\n\n\n\n\n\nFigure 6.2: Architecture overview of TensorFlow 2.0. Credit: Tensorflow.\n\n\n\nTensorFlow was developed to address the limitations of DistBelief (Yu et al. 2018)—the framework in use at Google from 2011 to 2015—by providing flexibility along three axes: 1) defining new layers, 2) refining training algorithms, and 3) defining new training algorithms. To understand what limitations in DistBelief led to the development of TensorFlow, we will first give a brief overview of the Parameter Server Architecture that DistBelief employed (Dean et al. 2012).\n\nYu, Yuan, Martı́n Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, et al. 2018. “Dynamic Control Flow in Large-Scale Machine Learning.” In Proceedings of the Thirteenth EuroSys Conference, 265–83. ACM. https://doi.org/10.1145/3190508.3190551.\n\nDean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, et al. 2012. “Large Scale Distributed Deep Networks.” In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, 1232–40. https://proceedings.neurips.cc/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html.\nThe Parameter Server (PS) architecture is a popular design for distributing the training of machine learning models, especially deep neural networks, across multiple machines. The fundamental idea is to separate the storage and management of model parameters from the computation used to update these parameters:\nStorage: The stateful parameter server processes handled the storage and management of model parameters. Given the large scale of models and the system’s distributed nature, these parameters were sharded across multiple parameter servers. Each server maintained a portion of the model parameters, making it \"stateful\" as it had to maintain and manage this state across the training process.\nComputation: The worker processes, which could be run in parallel, were stateless and purely computational. They processed data and computed gradients without maintaining any state or long-term memory (M. Li et al. 2014).\n\nLi, Mu, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014. “Communication Efficient Distributed Machine Learning with the Parameter Server.” In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, 19–27. https://proceedings.neurips.cc/paper/2014/hash/1ff1de774005f8da13f42943881c655f-Abstract.html.\n\n\n\n\n\n\nExercise 6.1: TensorFlow Core\n\n\n\n\n\nLet’s comprehensively understand core machine learning algorithms using TensorFlow and their practical applications in data analysis and predictive modeling. We will start with linear regression to predict survival rates from the Titanic dataset. Then, using TensorFlow, we will construct classifiers to identify different species of flowers based on their attributes. Next, we will use the K-Means algorithm and its application in segmenting datasets into cohesive clusters. Finally, we will apply hidden Markov models (HMM) to foresee weather patterns.\n\n\n\n\n\n\n\n\n\n\nExercise 6.2: TensorFlow Lite\n\n\n\n\n\nHere, we will see how to build a miniature machine-learning model for microcontrollers. We will build a mini neural network that is streamlined to learn from data even with limited resources and optimized for deployment by shrinking our model for efficient use on microcontrollers. TensorFlow Lite, a powerful technology derived from TensorFlow, shrinks models for tiny devices and helps enable on-device features like image recognition in smart devices. It is used in edge computing to allow for faster analysis and decisions in devices processing data locally.\n\n\n\n\nDistBelief and its architecture defined above were crucial in enabling distributed deep learning at Google but also introduced limitations that motivated the development of TensorFlow:\n\n\n6.3.2 Static Computation Graph\nModel parameters are distributed across various parameter servers in the parameter server architecture. Since DistBelief was primarily designed for the neural network paradigm, parameters corresponded to a fixed neural network structure. If the computation graph were dynamic, the distribution and coordination of parameters would become significantly more complicated. For example, a change in the graph might require the initialization of new parameters or the removal of existing ones, complicating the management and synchronization tasks of the parameter servers. This made it harder to implement models outside the neural framework or models that required dynamic computation graphs.\nTensorFlow was designed as a more general computation framework that expresses computation as a data flow graph. This allows for a wider variety of machine learning models and algorithms outside of neural networks and provides flexibility in refining models.\n\n\n6.3.3 Usability & Deployment\nThe parameter server model delineates roles (worker nodes and parameter servers) and is optimized for data center deployments, which might only be optimal for some use cases. For instance, this division introduces overheads or complexities on edge devices or in other non-data center environments.\nTensorFlow was built to run on multiple platforms, from mobile devices and edge devices to cloud infrastructure. It also aimed to be lighter and developer-friendly and to provide ease of use between local and distributed training.\n\n\n6.3.4 Architecture Design\nRather than using the parameter server architecture, TensorFlow deploys tasks across a cluster. These tasks are named processes that can communicate over a network, and each can execute TensorFlow’s core construct, the dataflow graph, and interface with various computing devices (like CPUs or GPUs). This graph is a directed representation where nodes symbolize computational operations, and edges depict the tensors (data) flowing between these operations.\nDespite the absence of traditional parameter servers, some “PS tasks” still store and manage parameters reminiscent of parameter servers in other systems. The remaining tasks, which usually handle computation, data processing, and gradient calculations, are referred to as \"worker tasks.\" TensorFlow’s PS tasks can execute any computation representable by the dataflow graph, meaning they aren’t just limited to parameter storage, and the computation can be distributed. This capability makes them significantly more versatile and gives users the power to program the PS tasks using the standard TensorFlow interface, the same one they’d use to define their models. As mentioned above, dataflow graphs’ structure also makes them inherently good for parallelism, allowing for the processing of large datasets.\n\n\n6.3.5 Built-in Functionality & Keras\nTensorFlow includes libraries to help users develop and deploy more use-case-specific models, and since this framework is open-source, this list continues to grow. These libraries address the entire ML development lifecycle: data preparation, model building, deployment, and responsible AI.\nOne of TensorFlow’s biggest advantages is its integration with Keras, though, as we will cover in the next section, Pytorch recently added a Keras integration. Keras is another ML framework built to be extremely user-friendly and, as a result, has a high level of abstraction. We will cover Keras in more depth later in this chapter. However, when discussing its integration with TensorFlow, it was important to note that it was originally built to be backend-agnostic. This means users could abstract away these complexities, offering a cleaner, more intuitive way to define and train models without worrying about compatibility issues with different backends. TensorFlow users had some complaints about the usability and readability of TensorFlow’s API, so as TF gained prominence, it integrated Keras as its high-level API. This integration offered major benefits to TensorFlow users since it introduced more intuitive readability and portability of models while still taking advantage of powerful backend features, Google support, and infrastructure to deploy models on various platforms.\n\n\n\n\n\n\nExercise 6.3: Exploring Keras: Building, Training, and Evaluating Neural Networks\n\n\n\n\n\nHere, we’ll learn how to use Keras, a high-level neural network API, for model development and training. We will explore the functional API for concise model building, understand loss and metric classes for model evaluation, and use built-in optimizers to update model parameters during training. Additionally, we’ll discover how to define custom layers and metrics tailored to our needs. Lastly, we’ll delve into Keras’ training loops to streamline the process of training neural networks on large datasets. This knowledge will empower us to build and optimize neural network models across various machine learning and artificial intelligence applications.\n\n\n\n\n\n\n6.3.6 Limitations and Challenges\nTensorFlow is one of the most popular deep learning frameworks but has criticisms and weaknesses, mostly focusing on usability and resource usage. While advantageous, the rapid pace of updates through its support from Google has sometimes led to backward compatibility issues, deprecated functions, and shifting documentation. Additionally, even with the Keras implementation, TensorFlow’s syntax and learning curve can be difficult for new users. One major critique of TensorFlow is its high overhead and memory consumption due to the range of built-in libraries and support. Some of these concerns can be addressed using pared-down versions, but they can still be limited in resource-constrained environments.\n\n\n6.3.7 PyTorch vs. TensorFlow\nPyTorch and TensorFlow have established themselves as frontrunners in the industry. Both frameworks offer robust functionalities but differ in design philosophies, ease of use, ecosystem, and deployment capabilities.\nDesign Philosophy and Programming Paradigm: PyTorch uses a dynamic computational graph termed eager execution. This makes it intuitive and facilitates debugging since operations are executed immediately and can be inspected on the fly. In comparison, earlier versions of TensorFlow were centered around a static computational graph, which required the graph’s complete definition before execution. However, TensorFlow 2.0 introduced eager execution by default, making it more aligned with PyTorch. PyTorch’s dynamic nature and Python-based approach have enabled its simplicity and flexibility, particularly for rapid prototyping. TensorFlow’s static graph approach in its earlier versions had a steeper learning curve; the introduction of TensorFlow 2.0, with its Keras integration as the high-level API, has significantly simplified the development process.\nDeployment: PyTorch is heavily favored in research environments; deploying PyTorch models in production settings was traditionally challenging. However, deployment has become more feasible with the introduction of TorchScript and the TorchServe tool. One of TensorFlow’s strengths lies in its scalability and deployment capabilities, especially on embedded and mobile platforms with TensorFlow Lite. TensorFlow Serving and TensorFlow.js further facilitate deployment in various environments, thus giving it a broader reach in the ecosystem.\nPerformance: Both frameworks offer efficient hardware acceleration for their operations. However, TensorFlow has a slightly more robust optimization workflow, such as the XLA (Accelerated Linear Algebra) compiler, which can further boost performance. Its static computational graph was also advantageous for certain optimizations in the early versions.\nEcosystem: PyTorch has a growing ecosystem with tools like TorchServe for serving models and libraries like TorchVision, TorchText, and TorchAudio for specific domains. As we mentioned earlier, TensorFlow has a broad and mature ecosystem. TensorFlow Extended (TFX) provides an end-to-end platform for deploying production machine learning pipelines. Other tools and libraries include TensorFlow Lite, TensorFlow.js, TensorFlow Hub, and TensorFlow Serving.\nTable 6.1 provides a comparative analysis:\n\n\n\nTable 6.1: Comparison of PyTorch and TensorFlow.\n\n\n\n\n\n\n\n\n\n\nFeature/Aspect\nPyTorch\nTensorFlow\n\n\n\n\nDesign Philosophy\nDynamic computational graph (eager execution)\nStatic computational graph (early versions); Eager execution in TensorFlow 2.0\n\n\nDeployment\nTraditionally challenging; Improved with TorchScript & TorchServe\nScalable, especially on embedded platforms with TensorFlow Lite\n\n\nPerformance & Optimization\nEfficient GPU acceleration\nRobust optimization with XLA compiler\n\n\nEcosystem\nTorchServe, TorchVision, TorchText, TorchAudio\nTensorFlow Extended (TFX), TensorFlow Lite, TensorFlow.js, TensorFlow Hub, TensorFlow Serving\n\n\nEase of Use\nPreferred for its Pythonic approach and rapid prototyping\nInitially steep learning curve; Simplified with Keras in TensorFlow 2.0", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -572,7 +572,7 @@ "section": "6.4 Basic Framework Components", "text": "6.4 Basic Framework Components\n\n6.4.1 Tensor data structures\nTo understand tensors, let us start from the familiar concepts in linear algebra. As demonstrated in Figure 6.3, vectors can be represented as a stack of numbers in a 1-dimensional array. Matrices follow the same idea, and one can think of them as many vectors stacked on each other, making them 2 dimensional. Higher dimensional tensors work the same way. A 3-dimensional tensor is simply a set of matrices stacked on each other in another direction. Therefore, vectors and matrices can be considered special cases of tensors with 1D and 2D dimensions, respectively.\n\n\n\n\n\n\nFigure 6.3: Visualization of Tensor Data Structure.\n\n\n\nDefining formally, in machine learning, tensors are a multi-dimensional array of numbers. The number of dimensions defines the rank of the tensor. As a generalization of linear algebra, the study of tensors is called multilinear algebra. There are noticeable similarities between matrices and higher-ranked tensors. First, extending the definitions given in linear algebra to tensors, such as with eigenvalues, eigenvectors, and rank (in the linear algebra sense), is possible. Furthermore, with the way we have defined tensors, it is possible to turn higher dimensional tensors into matrices. This is critical in practice, as the multiplication of abstract representations of higher dimensional tensors is often completed by first converting them into matrices for multiplication.\nTensors offer a flexible data structure that can represent data in higher dimensions. For example, to represent color image data, for each pixel value (in 2 dimensions), one needs the color values for red, green, and blue. With tensors, it is easy to contain image data in a single 3-dimensional tensor, with each number within it representing a certain color value in a certain location of the image. Extending even further, if we wanted to store a series of images, we could extend the dimensions such that the new dimension (to create a 4-dimensional tensor) represents our different images. This is exactly what the famous MNIST dataset does, loading a single 4-dimensional tensor when one calls to load the dataset, allowing a compact representation of all the data in one place.\n\n\n6.4.2 Computational graphs\n\nGraph Definition\nComputational graphs are a key component of deep learning frameworks like TensorFlow and PyTorch. They allow us to express complex neural network architectures efficiently and differentiatedly. A computational graph consists of a directed acyclic graph (DAG) where each node represents an operation or variable, and edges represent data dependencies between them.\nFor example, a node might represent a matrix multiplication operation, taking two input matrices (or tensors) and producing an output matrix (or tensor). To visualize this, consider the simple example in Figure 7.3. The directed acyclic graph above computes \\(z = x \\times y\\), where each variable is just numbers.\n\n\n\n\n\n\nFigure 6.4: Basic example of a computational graph.\n\n\n\nUnderneath the hood, the computational graphs represent abstractions for common layers like convolutional, pooling, recurrent, and dense layers, with data including activations, weights, and biases represented in tensors. Convolutional layers form the backbone of CNN models for computer vision. They detect spatial patterns in input data through learned filters. Recurrent layers like LSTMs and GRUs enable sequential data processing for tasks like language translation. Attention layers are used in transformers to draw global context from the entire input.\nLayers are higher-level abstractions that define computations on top of those tensors. For example, a Dense layer performs a matrix multiplication and addition between input/weight/bias tensors. Note that a layer operates on tensors as inputs and outputs; the layer is not a tensor. Some key differences:\n\nLayers contain states like weights and biases. Tensors are stateless, just holding data.\nLayers can modify internal state during training. Tensors are immutable/read-only.\nLayers are higher-level abstractions. Tensors are at a lower level and directly represent data and math operations.\nLayers define fixed computation patterns. Tensors flow between layers during execution.\nLayers are used indirectly when building models. Tensors flow between layers during execution.\n\nSo, while tensors are a core data structure that layers consume and produce, layers have additional functionality for defining parameterized operations and training. While a layer configures tensor operations under the hood, the layer remains distinct from the tensor objects. The layer abstraction makes building and training neural networks much more intuitive. This abstraction enables developers to build models by stacking these layers together without implementing the layer logic. For example, calling tf.keras.layers.Conv2D in TensorFlow creates a convolutional layer. The framework handles computing the convolutions, managing parameters, etc. This simplifies model development, allowing developers to focus on architecture rather than low-level implementations. Layer abstractions utilize highly optimized implementations for performance. They also enable portability, as the same architecture can run on different hardware backends like GPUs and TPUs.\nIn addition, computational graphs include activation functions like ReLU, sigmoid, and tanh that are essential to neural networks, and many frameworks provide these as standard abstractions. These functions introduce non-linearities that enable models to approximate complex functions. Frameworks provide these as simple, predefined operations that can be used when constructing models, for example, if.nn.relu in TensorFlow. This abstraction enables flexibility, as developers can easily swap activation functions for tuning performance. Predefined activations are also optimized by the framework for faster execution.\nIn recent years, models like ResNets and MobileNets have emerged as popular architectures, with current frameworks pre-packaging these as computational graphs. Rather than worrying about the fine details, developers can utilize them as a starting point, customizing as needed by substituting layers. This simplifies and speeds up model development, avoiding reinventing architectures from scratch. Predefined models include well-tested, optimized implementations that ensure good performance. Their modular design also enables transferring learned features to new tasks via transfer learning. These predefined architectures provide high-performance building blocks to create robust models quickly.\nThese layer abstractions, activation functions, and predefined architectures the frameworks provide constitute a computational graph. When a user defines a layer in a framework (e.g., tf.keras.layers.Dense()), the framework configures computational graph nodes and edges to represent that layer. The layer parameters like weights and biases become variables in the graph. The layer computations become operation nodes (such as the x and y in the figure above). When you call an activation function like tf.nn.relu(), the framework adds a ReLU operation node to the graph. Predefined architectures are just pre-configured subgraphs that can be inserted into your model’s graph. Thus, model definition via high-level abstractions creates a computational graph—the layers, activations, and architectures we use become graph nodes and edges.\nWe implicitly construct a computational graph when defining a neural network architecture in a framework. The framework uses this graph to determine operations to run during training and inference. Computational graphs bring several advantages over raw code, and that’s one of the core functionalities that is offered by a good ML framework:\n\nExplicit representation of data flow and operations\nAbility to optimize graph before execution\nAutomatic differentiation for training\nLanguage agnosticism - graph can be translated to run on GPUs, TPUs, etc.\nPortability - graph can be serialized, saved, and restored later\n\nComputational graphs are the fundamental building blocks of ML frameworks. Model definition via high-level abstractions creates a computational graph—the layers, activations, and architectures we use become graph nodes and edges. The framework compilers and optimizers operate on this graph to generate executable code. The abstractions provide a developer-friendly API for building computational graphs. Under the hood, it’s still graphs down! So, while you may not directly manipulate graphs as a framework user, they enable your high-level model specifications to be efficiently executed. The abstractions simplify model-building, while computational graphs make it possible.\n\n\nStatic vs. Dynamic Graphs\nDeep learning frameworks have traditionally followed one of two approaches for expressing computational graphs.\nStatic graphs (declare-then-execute): With this model, the entire computational graph must be defined upfront before running it. All operations and data dependencies must be specified during the declaration phase. TensorFlow originally followed this static approach - models were defined in a separate context, and then a session was created to run them. The benefit of static graphs is they allow more aggressive optimization since the framework can see the full graph. However, it also tends to be less flexible for research and interactivity. Changes to the graph require re-declaring the full model.\nFor example:\nx = tf.placeholder(tf.float32)\ny = tf.matmul(x, weights) + biases\nThe model is defined separately from execution, like building a blueprint. For TensorFlow 1. x, this is done using tf.Graph(). All ops and variables must be declared upfront. Subsequently, the graph is compiled and optimized before running. Execution is done later by feeding in tensor values.\nDynamic graphs (define-by-run): Unlike declaring (all) first and then executing, the graph is built dynamically as execution happens. There is no separate declaration phase - operations execute immediately as defined. This style is imperative and flexible, facilitating experimentation.\nPyTorch uses dynamic graphs, building the graph on the fly as execution happens. For example, consider the following code snippet, where the graph is built as the execution is taking place:\nx = torch.randn(4,784)\ny = torch.matmul(x, weights) + biases\nThe above example does not have separate compile/build/run phases. Ops define and execute immediately. With dynamic graphs, the definition is intertwined with execution, providing a more intuitive, interactive workflow. However, the downside is that there is less potential for optimization since the framework only sees the graph as it is built.\nRecently, however, the distinction has blurred as frameworks adopt both modes. TensorFlow 2.0 defaults to dynamic graph mode while letting users work with static graphs when needed. Dynamic declaration makes frameworks easier to use, while static models provide optimization benefits. The ideal framework offers both options.\nStatic graph declaration provides optimization opportunities but less interactivity. While dynamic execution offers flexibility and ease of use, it may have performance overhead. Here is a table comparing the pros and cons of static vs dynamic execution graphs:\n\n\n\n\n\n\n\n\nExecution Graph\nPros\nCons\n\n\n\n\nStatic (Declare-then-execute)\nEnable graph optimizations by seeing full model ahead of timeCan export and deploy frozen graphsGraph is packaged independently of code\nLess flexible for research and iterationChanges require rebuilding graphExecution has separate compile and run phases\n\n\nDynamic (Define-by-run)\nIntuitive imperative style like Python codeInterleave graph build with executionEasy to modify graphsDebugging seamlessly fits workflow\nHarder to optimize without full graphPossible slowdowns from graph building during executionCan require more memory\n\n\n\n\n\n\n6.4.3 Data Pipeline Tools\nComputational graphs can only be as good as the data they learn from and work on. Therefore, feeding training data efficiently is crucial for optimizing deep neural network performance, though it is often overlooked as one of the core functionalities. Many modern AI frameworks provide specialized pipelines to ingest, process, and augment datasets for model training.\n\nData Loaders\nThese pipelines’ cores are data loaders, which handle reading examples from storage formats like CSV files or image folders. Reading training examples from sources like files, databases, object storage, etc., is the job of the data loaders. Deep learning models require diverse data formats depending on the application. Among the popular formats is CSV, a versatile, simple format often used for tabular data. TFRecord: TensorFlow’s proprietary format, optimized for performance. Parquet: Columnar storage, offering efficient data compression and retrieval. JPEG/PNG: Commonly used for image data. WAV/MP3: Prevalent formats for audio data. For instance, tf.data is TensorFlows’s dataloading pipeline: https://www.tensorflow.org/guide/data.\nData loaders batch examples to leverage vectorization support in hardware. Batching refers to grouping multiple data points for simultaneous processing, leveraging the vectorized computation capabilities of hardware like GPUs. While typical batch sizes range from 32 to 512 examples, the optimal size often depends on the data’s memory footprint and the specific hardware constraints. Advanced loaders can stream virtually unlimited datasets from disk and cloud storage. They stream large datasets from disks or networks instead of fully loading them into memory, enabling unlimited dataset sizes.\nData loaders can also shuffle data across epochs for randomization and preprocess features in parallel with model training to expedite the training process. Randomly shuffling the order of examples between training epochs reduces bias and improves generalization.\nData loaders also support caching and prefetching strategies to optimize data delivery for fast, smooth model training. Caching preprocessed batches in memory allows them to be reused efficiently during multiple training steps and eliminates redundant processing. Prefetching, conversely, involves preloading subsequent batches, ensuring that the model never idles waiting for data.\n\n\n\n6.4.4 Data Augmentation\nBesides loading, data augmentation expands datasets synthetically. Augmentations apply random transformations for images like flipping, cropping, rotating, altering color, adding noise, etc. For audio, common augmentations involve mixing clips with background noise or modulating speed/pitch/volume.\nAugmentations increase variation in the training data. Frameworks like TensorFlow and PyTorch simplify applying random augmentations each epoch by integrating them into the data pipeline. By programmatically increasing variation in the training data distribution, augmentations reduce Overfitting and improve model generalization.\nMany frameworks simplify integrating augmentations into the data pipeline, applying them on the fly each epoch. Together, performant data loaders and extensive augmentations enable practitioners to feed massive, varied datasets to neural networks efficiently. Hands-off data pipelines represent a significant improvement in usability and productivity. They allow developers to focus more on model architecture and less on data wrangling when training deep learning models.\n\n\n6.4.5 Optimization Algorithms\nTraining a neural network is fundamentally an iterative process that seeks to minimize a loss function. The goal is to fine-tune the model weights and parameters to produce predictions close to the true target labels. Machine learning frameworks have greatly streamlined this process by offering extensive support in three critical areas: loss functions, optimization algorithms, and regularization techniques.\nLoss Functions are useful to quantify the difference between the model’s predictions and the true values. Different datasets require a different loss function to perform properly, as the loss function tells the computer the “objective” for it to aim. Commonly used loss functions are Mean Squared Error (MSE) for regression tasks and Cross-Entropy Loss for classification tasks.\nTo demonstrate some of the loss functions, imagine you have a set of inputs and the corresponding outputs, \\(Y_n\\), that denote the output of \\(n\\)’th value. The inputs are fed into the model, and the model outputs a prediction, which we can call \\(\\hat{Y_n}\\). With the predicted value and the real value, we can, for example, use the MSE to calculate the loss function:\n\\[MSE = \\frac{1}{N}\\sum_{n=1}^{N}(Y_n - \\hat{Y_n})^2\\]\nIf the problem is a classification problem, we do not want to use the MSE since the distance between the predicted value and the real value does not have significant meaning. For example, if one wants to recognize handwritten models, while 9 is further away from 2, it does not mean that the model is wrong in making the prediction. Therefore, we use the cross-entropy loss function, which is defined as:\n\\[Cross-Entropy = -\\sum_{n=1}^{N}Y_n\\log(\\hat{Y_n})\\]\nOnce a loss like the above is computed, we need methods to adjust the model’s parameters to reduce this loss or error during the training process. To do so, current frameworks use a gradient-based approach, which computes how much changes tuning the weights in a certain way changes the value of the loss function. Knowing this gradient, the model moves in the direction that reduces the gradient. Many challenges are associated with this, primarily stemming from the fact that the optimization problem could not be more, making it very easy to solve. More details about this will come in the AI Training section. Modern frameworks come equipped with efficient implementations of several optimization algorithms, many of which are variants of gradient descent algorithms with stochastic methods and adaptive learning rates. More information with clear examples can be found in the AI Training section.\nLastly, overly complex models tend to overfit, meaning they perform well on the training data but must generalize to new, unseen data (see Overfitting). To counteract this, regularization methods are employed to penalize model complexity and encourage it to learn simpler patterns. Dropout randomly sets a fraction of input units to 0 at each update during training, which helps prevent Overfitting.\nHowever, there are cases where the problem is more complex than the model can represent, which may result in underfitting. Therefore, choosing the right model architecture is also a critical step in the training process. Further heuristics and techniques are discussed in the AI Training section.\nFrameworks also efficiently implement gradient descent, Adagrad, Adadelta, and Adam. Adding regularization, such as dropout and L1/L2 penalties, prevents Overfitting during training. Batch normalization accelerates training by normalizing inputs to layers.\n\n\n6.4.6 Model Training Support\nA compilation step is required before training a defined neural network model. During this step, the neural network’s high-level architecture is transformed into an optimized, executable format. This process comprises several steps. The first step is to construct the computational graph, which represents all the mathematical operations and data flow within the model. We discussed this earlier.\nDuring training, the focus is on executing the computational graph. Every parameter within the graph, such as weights and biases, is assigned an initial value. Depending on the chosen initialization method, this value might be random or based on a predefined logic.\nThe next critical step is memory allocation. Essential memory is reserved for the model’s operations on both CPUs and GPUs, ensuring efficient data processing. The model’s operations are then mapped to the available hardware resources, particularly GPUs or TPUs, to expedite computation. Once the compilation is finalized, the model is prepared for training.\nThe training process employs various tools to enhance efficiency. Batch processing is commonly used to maximize computational throughput. Techniques like vectorization enable operations on entire data arrays rather than proceeding element-wise, which bolsters speed. Optimizations such as kernel fusion (refer to the Optimizations chapter) amalgamate multiple operations into a single action, minimizing computational overhead. Operations can also be segmented into phases, facilitating the concurrent processing of different mini-batches at various stages.\nFrameworks consistently checkpoint the state, preserving intermediate model versions during training. This ensures that progress is recovered if an interruption occurs, and training can be recommenced from the last checkpoint. Additionally, the system vigilantly monitors the model’s performance against a validation data set. Should the model begin to overfit (if its performance on the validation set declines), training is automatically halted, conserving computational resources and time.\nML frameworks incorporate a blend of model compilation, enhanced batch processing methods, and utilities such as checkpointing and early stopping. These resources manage the complex aspects of performance, enabling practitioners to zero in on model development and training. As a result, developers experience both speed and ease when utilizing neural networks’ capabilities.\n\n\n6.4.7 Validation and Analysis\nAfter training deep learning models, frameworks provide utilities to evaluate performance and gain insights into the models’ workings. These tools enable disciplined experimentation and debugging.\n\nEvaluation Metrics\nFrameworks include implementations of common evaluation metrics for validation:\n\nAccuracy - Fraction of correct predictions overall. They are widely used for classification.\nPrecision - Of positive predictions, how many were positive. Useful for imbalanced datasets.\nRecall - Of actual positives, how many did we predict correctly? Measures completeness.\nF1-score - Harmonic mean of precision and recall. Combines both metrics.\nAUC-ROC - Area under ROC curve. They are used for classification threshold analysis.\nMAP - Mean Average Precision. Evaluate ranked predictions in retrieval/detection.\nConfusion Matrix - Matrix that shows the true positives, true negatives, false positives, and false negatives. Provides a more detailed view of classification performance.\n\nThese metrics quantify model performance on validation data for comparison.\n\n\nVisualization\nVisualization tools provide insight into models:\n\nLoss curves - Plot training and validation loss over time to spot Overfitting.\nActivation grids - Illustrate features learned by convolutional filters.\nProjection - Reduce dimensionality for intuitive visualization.\nPrecision-recall curves - Assess classification tradeoffs.\n\nTools like TensorBoard for TensorFlow and TensorWatch for PyTorch enable real-time metrics and visualization during training.\n\n\n\n6.4.8 Differentiable programming\nMachine learning training methods such as backpropagation rely on the change in the loss function with respect to the change in weights (which essentially is the definition of derivatives). Thus, the ability to quickly and efficiently train large machine learning models relies on the computer’s ability to take derivatives. This makes differentiable programming one of the most important elements of a machine learning framework.\nWe can use four primary methods to make computers take derivatives. First, we can manually figure out the derivatives by hand and input them into the computer. This would quickly become a nightmare with many layers of neural networks if we had to compute all the derivatives in the backpropagation steps by hand. Another method is symbolic differentiation using computer algebra systems such as Mathematica, which can introduce a layer of inefficiency, as there needs to be a level of abstraction to take derivatives. Numerical derivatives, the practice of approximating gradients using finite difference methods, suffer from many problems, including high computational costs and larger grid sizes, leading to many errors. This leads to automatic differentiation, which exploits the primitive functions that computers use to represent operations to obtain an exact derivative. With automatic differentiation, the computational complexity of computing the gradient is proportional to computing the function itself. Intricacies of automatic differentiation are not dealt with by end users now, but resources to learn more can be found widely, such as from here. Today’s automatic differentiation and differentiable programming are ubiquitous and are done efficiently and automatically by modern machine learning frameworks.\n\n\n6.4.9 Hardware Acceleration\nThe trend to continuously train and deploy larger machine-learning models has made hardware acceleration support necessary for machine-learning platforms. Figure 6.5 shows the large number of companies that are offering hardware accelerators in different domains, such as “Very Low Power” and “Embedded” machine learning. Deep layers of neural networks require many matrix multiplications, which attract hardware that can compute matrix operations quickly and in parallel. In this landscape, two hardware architectures, the GPU and TPU, have emerged as leading choices for training machine learning models.\nThe use of hardware accelerators began with AlexNet, which paved the way for future works to utilize GPUs as hardware accelerators for training computer vision models. GPUs, or Graphics Processing Units, excel in handling many computations at once, making them ideal for the matrix operations central to neural network training. Their architecture, designed for rendering graphics, is perfect for the mathematical operations required in machine learning. While they are very useful for machine learning tasks and have been implemented in many hardware platforms, GPUs are still general purpose in that they can be used for other applications.\nOn the other hand, Tensor Processing Units (TPU) are hardware units designed specifically for neural networks. They focus on the multiply and accumulate (MAC) operation, and their hardware consists of a large hardware matrix that contains elements that efficiently compute the MAC operation. This concept, called the systolic array architecture, was pioneered by Kung and Leiserson (1979), but has proven to be a useful structure to efficiently compute matrix products and other operations within neural networks (such as convolutions).\n\nKung, Hsiang Tsung, and Charles E Leiserson. 1979. “Systolic Arrays (for VLSI).” In Sparse Matrix Proceedings 1978, 1:256–82. Society for industrial; applied mathematics Philadelphia, PA, USA.\nWhile TPUs can drastically reduce training times, they also have disadvantages. For example, many operations within the machine learning frameworks (primarily TensorFlow here since the TPU directly integrates with it) are not supported by TPUs. They cannot also support custom operations from the machine learning frameworks, and the network design must closely align with the hardware capabilities.\nToday, NVIDIA GPUs dominate training, aided by software libraries like CUDA, cuDNN, and TensorRT. Frameworks also include optimizations to maximize performance on these hardware types, like pruning unimportant connections and fusing layers. Combining these techniques with hardware acceleration provides greater efficiency. For inference, hardware is increasingly moving towards optimized ASICs and SoCs. Google’s TPUs accelerate models in data centers. Apple, Qualcomm, and others now produce AI-focused mobile chips. The NVIDIA Jetson family targets autonomous robots.\n\n\n\n\n\n\nFigure 6.5: Companies offering ML hardware accelerators. Credit: Gradient Flow.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -583,7 +583,7 @@ "section": "6.5 Advanced Features", "text": "6.5 Advanced Features\n\n6.5.1 Distributed training\nAs machine learning models have become larger over the years, it has become essential for large models to utilize multiple computing nodes in the training process. This process, distributed learning, has allowed for higher training capabilities but has also imposed challenges in implementation.\nWe can consider three different ways to spread the work of training machine learning models to multiple computing nodes. Input data partitioning refers to multiple processors running the same model on different input partitions. This is the easiest implementation and is available for many machine learning frameworks. The more challenging distribution of work comes with model parallelism, which refers to multiple computing nodes working on different parts of the model, and pipelined model parallelism, which refers to multiple computing nodes working on different layers of the model on the same input. The latter two mentioned here are active research areas.\nML frameworks that support distributed learning include TensorFlow (through its tf.distribute module), PyTorch (through its torch.nn.DataParallel and torch.nn.DistributedDataParallel modules), and MXNet (through its gluon API).\n\n\n6.5.2 Model Conversion\nMachine learning models have various methods to be represented and used within different frameworks and for different device types. For example, a model can be converted to be compatible with inference frameworks within the mobile device. The default format for TensorFlow models is checkpoint files containing weights and architectures, which are needed to retrain the models. However, models are typically converted to TensorFlow Lite format for mobile deployment. TensorFlow Lite uses a compact flat buffer representation and optimizations for fast inference on mobile hardware, discarding all the unnecessary baggage associated with training metadata, such as checkpoint file structures.\nThe default format for TensorFlow models is checkpoint files containing weights and architectures. For mobile deployment, models are typically converted to TensorFlow Lite format. TensorFlow Lite uses a compact flat buffer representation and optimizations for fast inference on mobile hardware.\nModel optimizations like quantization (see Optimizations chapter) can further optimize models for target architectures like mobile. This reduces the precision of weights and activations to uint8 or int8 for a smaller footprint and faster execution with supported hardware accelerators. For post-training quantization, TensorFlow’s converter handles analysis and conversion automatically.\nFrameworks like TensorFlow simplify deploying trained models to mobile and embedded IoT devices through easy conversion APIs for TFLite format and quantization. Ready-to-use conversion enables high-performance inference on mobile without a manual optimization burden. Besides TFLite, other common targets include TensorFlow.js for web deployment, TensorFlow Serving for cloud services, and TensorFlow Hub for transfer learning. TensorFlow’s conversion utilities handle these scenarios to streamline end-to-end workflows.\nMore information about model conversion in TensorFlow is linked here.\n\n\n6.5.3 AutoML, No-Code/Low-Code ML\nIn many cases, machine learning can have a relatively high barrier of entry compared to other fields. To successfully train and deploy models, one needs to have a critical understanding of a variety of disciplines, from data science (data processing, data cleaning), model structures (hyperparameter tuning, neural network architecture), hardware (acceleration, parallel processing), and more depending on the problem at hand. The complexity of these problems has led to the introduction of frameworks such as AutoML, which aims to make “Machine learning available for non-Machine Learning exports” and to “automate research in machine learning.” They have constructed AutoWEKA, which aids in the complex process of hyperparameter selection, and Auto-sklearn and Auto-pytorch, an extension of AutoWEKA into the popular sklearn and PyTorch Libraries.\nWhile these efforts to automate parts of machine learning tasks are underway, others have focused on making machine learning models easier by deploying no-code/low-code machine learning, utilizing a drag-and-drop interface with an easy-to-navigate user interface. Companies such as Apple, Google, and Amazon have already created these easy-to-use platforms to allow users to construct machine learning models that can integrate into their ecosystem.\nThese steps to remove barriers to entry continue to democratize machine learning, make it easier for beginners to access, and simplify workflow for experts.\n\n\n6.5.4 Advanced Learning Methods\n\nTransfer Learning\nTransfer learning is the practice of using knowledge gained from a pre-trained model to train and improve the performance of a model for a different task. For example, datasets trained on ImageNet datasets such as MobileNet and ResNet can help classify other image datasets. To do so, one may freeze the pre-trained model, utilizing it as a feature extractor to train a much smaller model built on top of the feature extraction. One can also fine-tune the entire model to fit the new task.\nTransfer learning has challenges, such as the modified model’s inability to conduct its original tasks after transfer learning. Papers such as “Learning without Forgetting” by Z. Li and Hoiem (2018) aims to address these challenges and have been implemented in modern machine learning platforms.\n\nLi, Zhizhong, and Derek Hoiem. 2018. “Learning Without Forgetting.” IEEE Trans. Pattern Anal. Mach. Intell. 40 (12): 2935–47. https://doi.org/10.1109/tpami.2017.2773081.\n\n\nFederated Learning\nConsider the problem of labeling items in a photo from personal devices and moving the image data from the devices to a central server, where a single model will train Using the image data provided by the devices. However, this presents many potential challenges. First, with many devices, one needs a massive network infrastructure to move and store data from these devices to a central location. With the number of devices present today, this is often not feasible and very costly. Furthermore, privacy challenges like those of Photos central servers are associated with moving personal data.\nFederated learning by McMahan et al. (2017) is a form of distributed computing that resolves these issues by distributing the models into personal devices for them to be trained on devices (Figure 6.6). Initially, a base global model is trained on a central server to be distributed to all devices. Using this base model, the devices individually compute the gradients and send them back to the central hub. Intuitively, this transfers model parameters instead of the data itself. This innovative approach allows the model to be trained with many different datasets (in our example, the set of images on personal devices) without transferring a large amount of potentially sensitive data. However, federated learning also comes with a series of challenges.\n\nMcMahan, Brendan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. “Communication-Efficient Learning of Deep Networks from Decentralized Data.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, edited by Aarti Singh and Xiaojin (Jerry) Zhu, 54:1273–82. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v54/mcmahan17a.html.\nData collected from devices may come with something other than suitable labels in many real-world situations. Users compound this issue; the primary data source can often be unreliable. This unreliability means that even when data is labeled, its accuracy or relevance is not guaranteed. Furthermore, each user’s data is unique, resulting in a significant variance in the data generated by different users. This non-IID nature of data, coupled with the unbalanced data production where some users generate more data than others, can adversely impact the performance of the global model. Researchers have worked to compensate for this by adding a proximal term to balance the local and global model and adding a frozen global hypersphere classifier.\nAdditional challenges are associated with federated learning. The number of mobile device owners can far exceed the average number of training samples on each device, leading to substantial communication overhead. This issue is particularly pronounced in the context of mobile networks, which are often used for such communication and can be unstable. This instability can result in delayed or failed transmission of model updates, thereby affecting the overall training process.\nThe heterogeneity of device resources is another hurdle. Devices participating in Federated Learning can have varying computational powers and memory capacities. This diversity makes it challenging to design efficient algorithms across all devices. Privacy and security issues are not a guarantee for federated learning. Techniques such as inversion gradient attacks can extract information about the training data from the model parameters. Despite these challenges, the many potential benefits continue to make it a popular research area. Open source programs such as Flower have been developed to simplify implementing federated learning with various machine learning frameworks.\nFigure 6.6 illustrates an example of federated learning. Consider a model used for medical predictions by diffrent hospitals. Given that medical data is extremely sensitive and must be kept private, it can’t be transferred to a centralized server for training. Instead, each hospital would firen-tune/train the base model using its own private data, while only communicating non-sensitive information with the Federated Server, such as the learned parameters.\n\n\n\n\n\n\nFigure 6.6: A centralized-server approach to federated learning. Credit: NVIDIA.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -594,7 +594,7 @@ "section": "6.6 Framework Specialization", "text": "6.6 Framework Specialization\nThus far, we have talked about ML frameworks generally. However, typically, frameworks are optimized based on the target environment’s computational capabilities and application requirements, ranging from the cloud to the edge to tiny devices. Choosing the right framework is crucial based on the target environment for deployment. This section provides an overview of the major types of AI frameworks tailored for cloud, edge, and TinyML environments to help understand the similarities and differences between these ecosystems.\n\n6.6.1 Cloud\nCloud-based AI frameworks assume access to ample computational power, memory, and storage resources in the cloud. They generally support both training and inference. Cloud-based AI frameworks are suited for applications where data can be sent to the cloud for processing, such as cloud-based AI services, large-scale data analytics, and web applications. Popular cloud AI frameworks include the ones we mentioned earlier, such as TensorFlow, PyTorch, MXNet, Keras, etc. These frameworks utilize GPUs, TPUs, distributed training, and AutoML to deliver scalable AI. Concepts like model serving, MLOps, and AIOps relate to the operationalization of AI in the cloud. Cloud AI powers services like Google Cloud AI and enables transfer learning using pre-trained models.\n\n\n6.6.2 Edge\nEdge AI frameworks are tailored to deploy AI models on IoT devices, smartphones, and edge servers. Edge AI frameworks are optimized for devices with moderate computational resources, balancing power and performance. Edge AI frameworks are ideal for applications requiring real-time or near-real-time processing, including robotics, autonomous vehicles, and smart devices. Key edge AI frameworks include TensorFlow Lite, PyTorch Mobile, CoreML, and others. They employ optimizations like model compression, quantization, and efficient neural network architectures. Hardware support includes CPUs, GPUs, NPUs, and accelerators like the Edge TPU. Edge AI enables use cases like mobile vision, speech recognition, and real-time anomaly detection.\n\n\n6.6.3 Embedded\nTinyML frameworks are specialized for deploying AI models on extremely resource-constrained devices, specifically microcontrollers and sensors within the IoT ecosystem. TinyML frameworks are designed for devices with limited resources, emphasizing minimal memory and power consumption. TinyML frameworks are specialized for use cases on resource-constrained IoT devices for predictive maintenance, gesture recognition, and environmental monitoring applications. Major TinyML frameworks include TensorFlow Lite Micro, uTensor, and ARM NN. They optimize complex models to fit within kilobytes of memory through techniques like quantization-aware training and reduced precision. TinyML allows intelligent sensing across battery-powered devices, enabling collaborative learning via federated learning. The choice of framework involves balancing model performance and computational constraints of the target platform, whether cloud, edge, or TinyML. Table 6.2 compares the major AI frameworks across cloud, edge, and TinyML environments:\n\n\n\nTable 6.2: Comparison of framework types for Cloud AI, Edge AI, and TinyML.\n\n\n\n\n\n\n\n\n\n\n\nFramework Type\nExamples\nKey Technologies\nUse Cases\n\n\n\n\nCloud AI\nTensorFlow, PyTorch, MXNet, Keras\nGPUs, TPUs, distributed training, AutoML, MLOps\nCloud services, web apps, big data analytics\n\n\nEdge AI\nTensorFlow Lite, PyTorch Mobile, Core ML\nModel optimization, compression, quantization, efficient NN architectures\nMobile apps, robots, autonomous systems, real-time processing\n\n\nTinyML\nTensorFlow Lite Micro, uTensor, ARM NN\nQuantization-aware training, reduced precision, neural architecture search\nIoT sensors, wearables, predictive maintenance, gesture recognition\n\n\n\n\n\n\nKey differences:\n\nCloud AI leverages massive computational power for complex models using GPUs/TPUs and distributed training\nEdge AI optimizes models to run locally on resource-constrained edge devices.\nTinyML fits models into extremely low memory and computes environments like microcontrollers", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -605,7 +605,7 @@ "section": "6.7 Embedded AI Frameworks", "text": "6.7 Embedded AI Frameworks\n\n6.7.1 Resource Constraints\nEmbedded systems face severe resource constraints that pose unique challenges when deploying machine learning models compared to traditional computing platforms. For example, microcontroller units (MCUs) commonly used in IoT devices often have:\n\nRAM ranges from tens of kilobytes to a few megabytes. The popular ESP8266 MCU has around 80KB RAM available to developers. This contrasts with 8GB or more on typical laptops and desktops today.\nFlash storage ranges from hundreds of kilobytes to a few megabytes. The Arduino Uno microcontroller provides just 32KB of code storage. Standard computers today have disk storage in the order of terabytes.\nProcessing power from just a few MHz to approximately 200MHz. The ESP8266 operates at 80MHz. This is several orders of magnitude slower than multi-GHz multi-core CPUs in servers and high-end laptops.\n\nThese tight constraints often make training machine learning models directly on microcontrollers infeasible. The limited RAM precludes handling large datasets for training. Energy usage for training would also quickly deplete battery-powered devices. Instead, models are trained on resource-rich systems and deployed on microcontrollers for optimized inference. But even inference poses challenges:\n\nModel Size: AI models are too large to fit on embedded and IoT devices. This necessitates model compression techniques, such as quantization, pruning, and knowledge distillation. Additionally, as we will see, many of the frameworks used by developers for AI development have large amounts of overhead and built-in libraries that embedded systems can’t support.\nComplexity of Tasks: With only tens of KBs to a few MBs of RAM, IoT devices and embedded systems are constrained in the complexity of tasks they can handle. Tasks that require large datasets or sophisticated algorithms—for example, LLMs—that would run smoothly on traditional computing platforms might be infeasible on embedded systems without compression or other optimization techniques due to memory limitations.\nData Storage and Processing: Embedded systems often process data in real time and might only store small amounts locally. Conversely, traditional computing systems can hold and process large datasets in memory, enabling faster data operations analysis and real-time updates.\nSecurity and Privacy: Limited memory also restricts the complexity of security algorithms and protocols, data encryption, reverse engineering protections, and more that can be implemented on the device. This could make some IoT devices more vulnerable to attacks.\n\nConsequently, specialized software optimizations and ML frameworks tailored for microcontrollers must work within these tight resource bounds. Clever optimization techniques like quantization, pruning, and knowledge distillation compress models to fit within limited memory (see Optimizations section). Learnings from neural architecture search help guide model designs.\nHardware improvements like dedicated ML accelerators on microcontrollers also help alleviate constraints. For instance, Qualcomm’s Hexagon DSP accelerates TensorFlow Lite models on Snapdragon mobile chips. Google’s Edge TPU packs ML performance into a tiny ASIC for edge devices. ARM Ethos-U55 offers efficient inference on Cortex-M class microcontrollers. These customized ML chips unlock advanced capabilities for resource-constrained applications.\nDue to limited processing power, it’s almost always infeasible to train AI models on IoT or embedded systems. Instead, models are trained on powerful traditional computers (often with GPUs) and then deployed on the embedded device for inference. TinyML specifically deals with this, ensuring models are lightweight enough for real-time inference on these constrained devices.\n\n\n6.7.2 Frameworks & Libraries\nEmbedded AI frameworks are software tools and libraries designed to enable AI and ML capabilities on embedded systems. These frameworks are essential for bringing AI to IoT devices, robotics, and other edge computing platforms, and they are designed to work where computational resources, memory, and power consumption are limited.\n\n\n6.7.3 Challenges\nWhile embedded systems present an enormous opportunity for deploying machine learning to enable intelligent capabilities at the edge, these resource-constrained environments pose significant challenges. Unlike typical cloud or desktop environments rich with computational resources, embedded devices introduce severe constraints around memory, processing power, energy efficiency, and specialized hardware. As a result, existing machine learning techniques and frameworks designed for server clusters with abundant resources do not directly translate to embedded systems. This section uncovers some of the challenges and opportunities for embedded systems and ML frameworks.\n\nFragmented Ecosystem\nThe lack of a unified ML framework led to a highly fragmented ecosystem. Engineers at companies like STMicroelectronics, NXP Semiconductors, and Renesas had to develop custom solutions tailored to their specific microcontroller and DSP architectures. These ad-hoc frameworks required extensive manual optimization for each low-level hardware platform. This made porting models extremely difficult, requiring redevelopment for new Arm, RISC-V, or proprietary architectures.\n\n\nDisparate Hardware Needs\nWithout a shared framework, there was no standard way to assess hardware’s capabilities. Vendors like Intel, Qualcomm, and NVIDIA created integrated solutions, blending models and improving software and hardware. This made it hard to discern the sources of performance gains - whether new chip designs like Intel’s low-power x86 cores or software optimizations were responsible. A standard framework was needed so vendors could evaluate their hardware’s capabilities fairly and reproducibly.\n\n\nLack of Portability\nWith standardized tools, adapting models trained in common frameworks like TensorFlow or PyTorch to run efficiently on microcontrollers was easier. It required time-consuming manual translation of models to run on specialized DSPs from companies like CEVA or low-power Arm M-series cores. No turnkey tools were enabling portable deployment across different architectures.\n\n\nIncomplete Infrastructure\nThe infrastructure to support key model development workflows needed to be improved. More support is needed for compression techniques to fit large models within constrained memory budgets. Tools for quantization to lower precision for faster inference were missing. Standardized APIs for integration into applications were incomplete. Essential functionality like on-device debugging, metrics, and performance profiling was absent. These gaps increased the cost and difficulty of embedded ML development.\n\n\nNo Standard Benchmark\nWithout unified benchmarks, there was no standard way to assess and compare the capabilities of different hardware platforms from vendors like NVIDIA, Arm, and Ambiq Micro. Existing evaluations relied on proprietary benchmarks tailored to showcase the strengths of particular chips. This made it impossible to measure hardware improvements objectively in a fair, neutral manner. The Benchmarking AI chapter discusses this topic in more detail.\n\n\nMinimal Real-World Testing\nMuch of the benchmarks relied on synthetic data. Rigorously testing models on real-world embedded applications was difficult without standardized datasets and benchmarks, raising questions about how performance claims would translate to real-world usage. More extensive testing was needed to validate chips in actual use cases.\nThe lack of shared frameworks and infrastructure slowed TinyML adoption, hampering the integration of ML into embedded products. Recent standardized frameworks have begun addressing these issues through improved portability, performance profiling, and benchmarking support. However, ongoing innovation is still needed to enable seamless, cost-effective deployment of AI to edge devices.\n\n\nSummary\nThe absence of standardized frameworks, benchmarks, and infrastructure for embedded ML has traditionally hampered adoption. However, recent progress has been made in developing shared frameworks like TensorFlow Lite Micro and benchmark suites like MLPerf Tiny that aim to accelerate the proliferation of TinyML solutions. However, overcoming the fragmentation and difficulty of embedded deployment remains an ongoing process.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -616,7 +616,7 @@ "section": "6.8 Examples", "text": "6.8 Examples\nMachine learning deployment on microcontrollers and other embedded devices often requires specially optimized software libraries and frameworks to work within tight memory, compute, and power constraints. Several options exist for performing inference on such resource-limited hardware, each with its approach to optimizing model execution. This section will explore the key characteristics and design principles behind TFLite Micro, TinyEngine, and CMSIS-NN, providing insight into how each framework tackles the complex problem of high-accuracy yet efficient neural network execution on microcontrollers. It will also showcase different approaches for implementing efficient TinyML frameworks.\nTable 6.3 summarizes the key differences and similarities between these three specialized machine-learning inference frameworks for embedded systems and microcontrollers.\n\n\n\nTable 6.3: Comparison of frameworks: TensorFlow Lite Micro, TinyEngine, and CMSIS-NN\n\n\n\n\n\n\n\n\n\n\n\nFramework\nTensorFlow Lite Micro\nTinyEngine\nCMSIS-NN\n\n\n\n\nApproach\nInterpreter-based\nStatic compilation\nOptimized neural network kernels\n\n\nHardware Focus\nGeneral embedded devices\nMicrocontrollers\nARM Cortex-M processors\n\n\nArithmetic Support\nFloating point\nFloating point, fixed point\nFloating point, fixed point\n\n\nModel Support\nGeneral neural network models\nModels co-designed with TinyNAS\nCommon neural network layer types\n\n\nCode Footprint\nLarger due to inclusion of interpreter and ops\nSmall, includes only ops needed for model\nLightweight by design\n\n\nLatency\nHigher due to interpretation overhead\nVery low due to compiled model\nLow latency focus\n\n\nMemory Management\nDynamically managed by interpreter\nModel-level optimization\nTools for efficient allocation\n\n\nOptimization Approach\nSome code generation features\nSpecialized kernels, operator fusion\nArchitecture-specific assembly optimizations\n\n\nKey Benefits\nFlexibility, portability, ease of updating models\nMaximizes performance, optimized memory usage\nHardware acceleration, standardized API, portability\n\n\n\n\n\n\nWe will understand each of these in greater detail in the following sections.\n\n6.8.1 Interpreter\nTensorFlow Lite Micro (TFLM) is a machine learning inference framework designed for embedded devices with limited resources. It uses an interpreter to load and execute machine learning models, which provides flexibility and ease of updating models in the field (David et al. 2021).\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite Micro: Embedded Machine Learning for Tinyml Systems.” Proceedings of Machine Learning and Systems 3: 800–811.\nTraditional interpreters often have significant branching overhead, which can reduce performance. However, machine learning model interpretation benefits from the efficiency of long-running kernels, where each kernel runtime is relatively large and helps mitigate interpreter overhead.\nAn alternative to an interpreter-based inference engine is to generate native code from a model during export. This can improve performance, but it sacrifices portability and flexibility, as the generated code needs recompilation for each target platform and must be replaced entirely to modify a model.\nTFLM balances the simplicity of code compilation and the flexibility of an interpreter-based approach by incorporating certain code-generation features. For example, the library can be constructed solely from source files, offering much of the compilation simplicity associated with code generation while retaining the benefits of an interpreter-based model execution framework.\nAn interpreter-based approach offers several benefits over code generation for machine learning inference on embedded devices:\n\nFlexibility: Models can be updated in the field without recompiling the entire application.\nPortability: The interpreter can be used to execute models on different target platforms without porting the code.\nMemory efficiency: The interpreter can share code across multiple models, reducing memory usage.\nEase of development: Interpreters are easier to develop and maintain than code generators.\n\nTensorFlow Lite Micro is a powerful and flexible framework for machine learning inference on embedded devices. Its interpreter-based approach offers several benefits over code generation, including flexibility, portability, memory efficiency, and ease of development.\n\n\n6.8.2 Compiler-based\nTinyEngine is an ML inference framework designed specifically for resource-constrained microcontrollers. It employs several optimizations to enable high-accuracy neural network execution within the tight constraints of memory, computing, and storage on microcontrollers (Lin et al. 2020).\n\nLin, Ji, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. “MCUNet: Tiny Deep Learning on IoT Devices.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html.\nWhile inference frameworks like TFLite Micro use interpreters to execute the neural network graph dynamically at runtime, this adds significant overhead regarding memory usage to store metadata, interpretation latency, and lack of optimizations. However, TFLite argues that the overhead is small. TinyEngine eliminates this overhead by employing a code generation approach. It analyzes the network graph during compilation and generates specialized code to execute just that model. This code is natively compiled into the application binary, avoiding runtime interpretation costs.\nConventional ML frameworks schedule memory per layer, trying to minimize usage for each layer separately. TinyEngine does model-level scheduling instead of analyzing memory usage across layers. It allocates a common buffer size based on the maximum memory needs of all layers. This buffer is then shared efficiently across layers to increase data reuse.\nTinyEngine also specializes in the kernels for each layer through techniques like tiling, unrolling, and fusing operators. For example, it will generate unrolled compute kernels with the number of loops needed for a 3x3 or 5x5 convolution. These specialized kernels extract maximum performance from the microcontroller hardware. It uses optimized depthwise convolutions to minimize memory allocations by computing each channel’s output in place over the input channel data. This technique exploits the channel-separable nature of depthwise convolutions to reduce peak memory size.\nLike TFLite Micro, the compiled TinyEngine binary only includes ops needed for a specific model rather than all possible operations. This results in a very small binary footprint, keeping code size low for memory-constrained devices.\nOne difference between TFLite Micro and TinyEngine is that the latter is co-designed with “TinyNAS,” an architecture search method for microcontroller models similar to differential NAS for microcontrollers. TinyEngine’s efficiency allows for exploring larger and more accurate models through NAS. It also provides feedback to TinyNAS on which models can fit within the hardware constraints.\nThrough various custom techniques, such as static compilation, model-based scheduling, specialized kernels, and co-design with NAS, TinyEngine enables high-accuracy deep learning inference within microcontrollers’ tight resource constraints.\n\n\n6.8.3 Library\nCMSIS-NN, standing for Cortex Microcontroller Software Interface Standard for Neural Networks, is a software library devised by ARM. It offers a standardized interface for deploying neural network inference on microcontrollers and embedded systems, focusing on optimization for ARM Cortex-M processors (Lai, Suda, and Chandra 2018).\n\nLai, Liangzhen, Naveen Suda, and Vikas Chandra. 2018. “Cmsis-Nn: Efficient Neural Network Kernels for Arm Cortex-m Cpus.” ArXiv Preprint abs/1801.06601. https://arxiv.org/abs/1801.06601.\nNeural Network Kernels: CMSIS-NN has highly efficient kernels that handle fundamental neural network operations such as convolution, pooling, fully connected layers, and activation functions. It caters to a broad range of neural network models by supporting floating and fixed-point arithmetic. The latter is especially beneficial for resource-constrained devices as it curtails memory and computational requirements (Quantization).\nHardware Acceleration: CMSIS-NN harnesses the power of Single Instruction, Multiple Data (SIMD) instructions available on many Cortex-M processors. This allows for parallel processing of multiple data elements within a single instruction, thereby boosting computational efficiency. Certain Cortex-M processors feature Digital Signal Processing (DSP) extensions that CMSIS-NN can exploit for accelerated neural network execution. The library also incorporates assembly-level optimizations tailored to specific microcontroller architectures to enhance performance further.\nStandardized API: CMSIS-NN offers a consistent and abstracted API that protects developers from the complexities of low-level hardware details. This makes the integration of neural network models into applications simpler. It may also encompass tools or utilities for converting popular neural network model formats into a format that is compatible with CMSIS-NN.\nMemory Management: CMSIS-NN provides functions for efficient memory allocation and management, which is vital in embedded systems where memory resources are scarce. It ensures optimal memory usage during inference and, in some instances, allows in-place operations to decrease memory overhead.\nPortability: CMSIS-NN is designed for portability across various Cortex-M processors. This enables developers to write code that can operate on different microcontrollers without significant modifications.\nLow Latency: CMSIS-NN minimizes inference latency, making it an ideal choice for real-time applications where swift decision-making is paramount.\nEnergy Efficiency: The library is designed with a focus on energy efficiency, making it suitable for battery-powered and energy-constrained devices.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -627,7 +627,7 @@ "section": "6.9 Choosing the Right Framework", "text": "6.9 Choosing the Right Framework\nChoosing the right machine learning framework for a given application requires carefully evaluating models, hardware, and software considerations. By analyzing these three aspects—models, hardware, and software—ML engineers can select the optimal framework and customize it as needed for efficient and performant on-device ML applications. The goal is to balance model complexity, hardware limitations, and software integration to design a tailored ML pipeline for embedded and edge devices.\n\n\n\n\n\n\nFigure 6.7: TensorFlow Framework Comparison - General. Credit: TensorFlow.\n\n\n\n\n6.9.1 Model\nTensorFlow supports significantly more ops than TensorFlow Lite and TensorFlow Lite Micro as it is typically used for research or cloud deployment, which require a large number of and more flexibility with operators (see Figure 6.7). TensorFlow Lite supports select ops for on-device training, whereas TensorFlow Micro does not. TensorFlow Lite also supports dynamic shapes and quantization-aware training, but TensorFlow Micro does not. In contrast, TensorFlow Lite and TensorFlow Micro offer native quantization tooling and support, where quantization refers to transforming an ML program into an approximated representation with available lower precision operations.\n\n\n6.9.2 Software\n\n\n\n\n\n\nFigure 6.8: TensorFlow Framework Comparison - Software. Credit: TensorFlow.\n\n\n\nTensorFlow Lite Micro does not have OS support, while TensorFlow and TensorFlow Lite do, to reduce memory overhead, make startup times faster, and consume less energy (see Figure 6.8). TensorFlow Lite Micro can be used in conjunction with real-time operating systems (RTOS) like FreeRTOS, Zephyr, and Mbed OS. TensorFlow Lite and TensorFlow Lite Micro support model memory mapping, allowing models to be directly accessed from flash storage rather than loaded into RAM, whereas TensorFlow does not. TensorFlow and TensorFlow Lite support accelerator delegation to schedule code to different accelerators, whereas TensorFlow Lite Micro does not, as embedded systems tend to have a limited array of specialized accelerators.\n\n\n6.9.3 Hardware\n\n\n\n\n\n\nFigure 6.9: TensorFlow Framework Comparison - Hardware. Credit: TensorFlow.\n\n\n\nTensorFlow Lite and TensorFlow Lite Micro have significantly smaller base binary sizes and memory footprints than TensorFlow (see Figure 6.9). For example, a typical TensorFlow Lite Micro binary is less than 200KB, whereas TensorFlow is much larger. This is due to the resource-constrained environments of embedded systems. TensorFlow supports x86, TPUs, and GPUs like NVIDIA, AMD, and Intel. TensorFlow Lite supports Arm Cortex-A and x86 processors commonly used on mobile phones and tablets. The latter is stripped of all the unnecessary training logic for on-device deployment. TensorFlow Lite Micro provides support for microcontroller-focused Arm Cortex M cores like M0, M3, M4, and M7, as well as DSPs like Hexagon and SHARC and MCUs like STM32, NXP Kinetis, Microchip AVR.\nSelecting the appropriate AI framework is essential to ensure that embedded systems can efficiently execute AI models. Key factors to consider when choosing a machine learning framework are ease of use, community support, performance, scalability, integration with data engineering tools, and integration with model optimization tools. By understanding these factors, you can make informed decisions and maximize the potential of your machine-learning initiatives.\n\n\n6.9.4 Other Factors\nSeveral other key factors beyond models, hardware, and software should be considered when evaluating AI frameworks for embedded systems.\n\nPerformance\nPerformance is critical in embedded systems where computational resources are limited. Evaluate the framework’s ability to optimize model inference for embedded hardware. Model quantization and hardware acceleration support are crucial in achieving efficient inference.\n\n\nScalability\nScalability is essential when considering the potential growth of an embedded AI project. The framework should support the deployment of models on various embedded devices, from microcontrollers to more powerful processors. It should also seamlessly handle both small-scale and large-scale deployments.\n\n\nIntegration with Data Engineering Tools\nData engineering tools are essential for data preprocessing and pipeline management. An ideal AI framework for embedded systems should seamlessly integrate with these tools, allowing for efficient data ingestion, transformation, and model training.\n\n\nIntegration with Model Optimization Tools\nModel optimization ensures that AI models are well-suited for embedded deployment. Evaluate whether the framework integrates with model optimization tools like TensorFlow Lite Converter or ONNX Runtime to facilitate model quantization and size reduction.\n\n\nEase of Use\nThe ease of use of an AI framework significantly impacts development efficiency. A framework with a user-friendly interface and clear documentation reduces developers’ learning curve. Consideration should be given to whether the framework supports high-level APIs, allowing developers to focus on model design rather than low-level implementation details. This factor is incredibly important for embedded systems, which have fewer features than typical developers might be accustomed to.\n\n\nCommunity Support\nCommunity support plays another essential factor. Frameworks with active and engaged communities often have well-maintained codebases, receive regular updates, and provide valuable forums for problem-solving. As a result, community support also plays into Ease of Use because it ensures that developers have access to a wealth of resources, including tutorials and example projects. Community support provides some assurance that the framework will continue to be supported for future updates. There are only a few frameworks that cater to TinyML needs. TensorFlow Lite Micro is the most popular and has the most community support.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -638,7 +638,7 @@ "section": "6.10 Future Trends in ML Frameworks", "text": "6.10 Future Trends in ML Frameworks\n\n6.10.1 Decomposition\nCurrently, the ML system stack consists of four abstractions as shown in Figure 6.10, namely (1) computational graphs, (2) tensor programs, (3) libraries and runtimes, and (4) hardware primitives.\n\n\n\n\n\n\nFigure 6.10: Four abstractions in current ML system stacks. Credit: TVM.\n\n\n\nThis has led to vertical (i.e., between abstraction levels) and horizontal (i.e., library-driven vs. compilation-driven approaches to tensor computation) boundaries, which hinder innovation for ML. Future work in ML frameworks can look toward breaking these boundaries. In December 2021, Apache TVM Unity was proposed, which aimed to facilitate interactions between the different abstraction levels (as well as the people behind them, such as ML scientists, ML engineers, and hardware engineers) and co-optimize decisions in all four abstraction levels.\n\n\n6.10.2 High-Performance Compilers & Libraries\nAs ML frameworks further develop, high-performance compilers and libraries will continue to emerge. Some current examples include TensorFlow XLA and Nvidia’s CUTLASS, which accelerate linear algebra operations in computational graphs, and Nvidia’s TensorRT, which accelerates and optimizes inference.\n\n\n6.10.3 ML for ML Frameworks\nWe can also use ML to improve ML frameworks in the future. Some current uses of ML for ML frameworks include:\n\nhyperparameter optimization using techniques such as Bayesian optimization, random search, and grid search\nneural architecture search (NAS) to automatically search for optimal network architectures\nAutoML, which as described in Section 6.5, automates the ML pipeline.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -649,7 +649,7 @@ "section": "6.11 Conclusion", "text": "6.11 Conclusion\nIn summary, selecting the optimal framework requires thoroughly evaluating options against criteria like usability, community support, performance, hardware compatibility, and model conversion abilities. There is no universal best solution, as the right framework depends on the specific constraints and use case.\nTensorFlow Lite Micro currently provides a strong starting point for extremely resource-constrained microcontroller-based platforms. Its comprehensive optimization tooling, such as quantization mapping and kernel optimizations, enables high performance on devices like Arm Cortex-M and RISC-V processors. The active developer community ensures accessible technical support. Seamless integration with TensorFlow for training and converting models makes the workflow cohesive.\nFor platforms with more capable CPUs like Cortex-A, TensorFlow Lite for Microcontrollers expands possibilities. It provides greater flexibility for custom and advanced models beyond the core operators in TFLite Micro. However, this comes at the cost of a larger memory footprint. These frameworks are ideal for automotive systems, drones, and more powerful edge devices that can benefit from greater model sophistication.\nFrameworks specifically built for specialized hardware like CMSIS-NN on Cortex-M processors can further maximize performance but sacrifice portability. Integrated frameworks from processor vendors tailor the stack to their architectures, unlocking the full potential of their chips but locking you into their ecosystem.\nUltimately, choosing the right framework involves finding the best match between its capabilities and the requirements of the target platform. This requires balancing tradeoffs between performance needs, hardware constraints, model complexity, and other factors. Thoroughly assessing intended models and use cases and evaluating options against key metrics will guide developers in picking the ideal framework for their embedded ML application.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -658,9 +658,9 @@ "href": "contents/frameworks/frameworks.html#sec-ai-frameworks-resource", "title": "6  AI Frameworks", "section": "6.12 Resources", - "text": "6.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nFrameworks overview.\nEmbedded systems software.\nInference engines: TF vs. TFLite.\nTF flavors: TF vs. TFLite vs. TFLite Micro.\nTFLite Micro:\n\nTFLite Micro Big Picture.\nTFLite Micro Interpreter.\nTFLite Micro Model Format.\nTFLite Micro Memory Allocation.\nTFLite Micro NN Operations.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 6.1\nExercise 6.2\nExercise 6.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "6.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nFrameworks overview.\nEmbedded systems software.\nInference engines: TF vs. TFLite.\nTF flavors: TF vs. TFLite vs. TFLite Micro.\nTFLite Micro:\n\nTFLite Micro Big Picture.\nTFLite Micro Interpreter.\nTFLite Micro Model Format.\nTFLite Micro Memory Allocation.\nTFLite Micro NN Operations.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 6.1\nExercise 6.2\nExercise 6.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Workflow", "6  AI Frameworks" ] }, @@ -671,7 +671,7 @@ "section": "", "text": "7.1 Introduction\nTraining is critical for developing accurate and useful AI systems using machine learning. The training aims to create a machine learning model that can generalize to new, unseen data rather than memorizing the training examples. This is done by feeding training data into algorithms that learn patterns from these examples by adjusting internal parameters.\nThe algorithms minimize a loss function, which compares their predictions on the training data to the known labels or solutions, guiding the learning. Effective training often requires high-quality, representative data sets large enough to capture variability in real-world use cases.\nIt also requires choosing an algorithm suited to the task, whether a neural network for computer vision, a reinforcement learning algorithm for robotic control, or a tree-based method for categorical prediction. Careful tuning is needed for the model structure, such as neural network depth and width, and learning parameters like step size and regularization strength.\nTechniques to prevent overfitting like regularization penalties and validation with held-out data, are also important. Overfitting can occur when a model fits the training data too closely, failing to generalize to new data. This can happen if the model is too complex or trained too long.\nTo avoid overfitting, regularization techniques can help constrain the model. One regularization method is adding a penalty term to the loss function that discourages complexity, like the L2 norm of the weights. This penalizes large parameter values. Another technique is dropout, where a percentage of neurons is randomly set to zero during training. This reduces neuron co-adaptation.\nValidation methods also help detect and avoid overfitting. Part of the training data is held out from the training loop as a validation set. The model is evaluated on this data. If validation error increases while training error decreases, overfitting occurs. The training can then be stopped early or regularized more strongly. Regularization and validation enable models to train to maximum capability without overfitting the training data.\nTraining takes significant computing resources, especially for deep neural networks used in computer vision, natural language processing, and other areas. These networks have millions of adjustable weights that must be tuned through extensive training. Hardware improvements and distributed training techniques have enabled training ever larger neural nets that can achieve human-level performance on some tasks.\nIn summary, some key points about training:\nWe will walk you through these details in the rest of the sections. Understanding how to effectively leverage data, algorithms, parameter optimization, and generalization through thorough training is essential for developing capable, deployable AI systems that work robustly in the real world.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -682,7 +682,7 @@ "section": "", "text": "Data is crucial: Machine learning models learn from examples in training data. More high-quality, representative data leads to better model performance. Data needs to be processed and formatted for training.\nAlgorithms learn from data: Different algorithms (neural networks, decision trees, etc.) have different approaches to finding patterns in data. Choosing the right algorithm for the task is important.\nTraining refines model parameters: Model training adjusts internal parameters to find patterns in data. Advanced models like neural networks have many adjustable weights. Training iteratively adjusts weights to minimize a loss function.\nGeneralization is the goal: A model that overfits the training data will not generalize well. Regularization techniques (dropout, early stopping, etc.) reduce overfitting. Validation data is used to evaluate generalization.\nTraining takes compute resources: Training complex models requires significant processing power and time. Hardware improvements and distributed training across GPUs/TPUs have enabled advances.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -693,7 +693,7 @@ "section": "7.2 Mathematics of Neural Networks", "text": "7.2 Mathematics of Neural Networks\nDeep learning has revolutionized machine learning and artificial intelligence, enabling computers to learn complex patterns and make intelligent decisions. The neural network is at the heart of the deep learning revolution, and as discussed in section 3, “Deep Learning Primer,” it is a cornerstone in some of these advancements.\nNeural networks are made up of simple functions layered on each other. Each layer takes in some data, performs some computation, and passes it to the next layer. These layers learn progressively high-level features useful for the tasks the network is trained to perform. For example, in a network trained for image recognition, the input layer may take in pixel values, while the next layers may detect simple shapes like edges. The layers after that may detect more complex shapes like noses, eyes, etc. The final output layer classifies the image as a whole.\nThe network in a neural network refers to how these layers are connected. Each layer’s output is considered a single neuron and is connected to many other neurons in the layers preceding it, forming a “network.” The way these neurons interact is determined by the weights between them, which model synaptic strengths similar to that of a brain’s neuron. The neural network is trained by adjusting these weights. Concretely, the weights are initially set randomly, then input is fed in, the output is compared to the desired result, and finally, the weights are tweaked to improve the network. This process is repeated until the network reliably minimizes the loss, indicating it has learned the patterns in the data.\nHow is this process defined mathematically? Formally, neural networks are mathematical models that consist of alternating linear and nonlinear operations, parameterized by a set of learnable weights that are trained to minimize some loss function. This loss function measures how good our model is concerning fitting our training data, and it produces a numerical value when evaluated on our model against the training data. Training neural networks involves repeatedly evaluating the loss function on many different data points to measure how good our model is, then continuously tweaking the weights of our model using backpropagation so that the loss decreases, ultimately optimizing the model to fit our data.\n\n7.2.1 Neural Network Notation\nDiving into the details, the core of a neural network can be viewed as a sequence of alternating linear and nonlinear operations, as show in Figure 7.1:\n\\[\nL_i = W_i A_{i-1}\n\\]\n\\[\nA_i = F_i(L_{i})\n\\]\n\n\n\n\n\n\nNote\n\n\n\nWhy are the nonlinear operations necessary? If we only had linear layers, the entire network would be equivalent to a single linear layer consisting of the product of the linear operators. Hence, the nonlinear functions play a key role in the power of neural networks as they enhance the neural network’s ability to fit functions.\n\n\n\n\n\n\n\n\nNote\n\n\n\nConvolutions are also linear operators and can be cast as a matrix multiplication.\n\n\n\n\n\n\n\n\nFigure 7.1: Neural network diagram. Credit: astroML.\n\n\n\nWhere \\(A_{0}\\) is a vector input to the neural network (i.e., an image that we want the neural network to classify or some other data that the neural network operates on), \\(A_{n}\\) (where \\(n\\) is the number of layers of the network) is the vector output of the neural network (i.e., a vector of size 10 in the case of classifying pictures of handwritten digits), \\(W_i\\)s are the weights of the neural network that are tweaked at training time to fit our data, and \\(F_{i}\\) is that layer’s nonlinear activation function (i.e., ReLU, softmax, etc.). As defined, the intermediate output of the neural network is a vector of real-valued numbers with dimensions:\n\\[\nL_i, A_i \\in \\mathbb{R}^{d_{i}}\n\\]\nWhere \\(d_{i}\\) is the number of neurons at layer \\(i\\); in the case of the first layer \\(i=0\\), \\(d_{i}\\) is the dimension of the input data, and in the last layer \\(i=n\\), \\(d_{n}\\) is the dimension of the output label. Anything in between can be set arbitrarily and may be viewed as the architecture of the neural network (i.e., the dimensionality of the intermediate layers). The weights, which determine how each layer of the neural network interacts with each other, are matrices of real numbers with shape.\n\\[\nW_i \\in \\mathbb{R}^{d_{i} \\times d_{i-1}}\n\\]\nOur neural network, as defined, performs a sequence of linear and nonlinear operations on the input data (\\(L_{0}\\)) to obtain predictions (\\(L_{n}\\)), which hopefully is a good answer to what we want the neural network to do on the input (i.e., classify if the input image is a cat or not). Our neural network may then be represented succinctly as a function \\(N\\) which takes in an input \\(x \\in \\mathbb{R}^{d_0}\\) parameterized by \\(W_1, ..., W_n\\):\n\\[\nN(x; W_1, ... W_n) = \\text{Let } A_0 = x, \\text{ then output } A_n\n\\]\nNext, we will see how to evaluate this neural network against training data by introducing a loss function.\n\n\n7.2.2 Loss Function as a Measure of Goodness of Fit against Training Data\nAfter defining our neural network, we are given some training data, which is a set of points \\({(x_j, y_j)}\\) for \\(j=1..M\\), and we want to evaluate how good our neural network is at fitting this data. To do this, we introduce a loss function, which is a function that takes the output of the neural network on a particular datapoint (\\(N(x_j; W_1, ..., W_n)\\)) and compares it against the “label” of that particular datapoint (the corresponding \\(y_j\\)), and outputs a single numerical scalar (i.e., one real number) that represents how “good” the neural network fit that particular data point; the final measure of how good the neural network is on the entire dataset is therefore just the average of the losses across all data points.\nThere are many different types of loss functions; for example, in the case of image classification, we might use the cross-entropy loss function, which tells us how well two vectors representing classification predictions compare (i.e., if our prediction predicts that an image is more likely a dog, but the label says it is a cat, it will return a high “loss,” indicating a bad fit).\nMathematically, this loss function is a function that takes in two real-valued vectors of the shape of the label and outputs a single numerical scalar. \\[\nL: \\mathbb{R}^{d_{n}} \\times \\mathbb{R}^{d_{n}} \\longrightarrow \\mathbb{R}\n\\]\nThe loss across the entire dataset can be written as the average loss across all data points in the training data.\n\nLoss Function for Optimizing Neural Network Model on a Dataset \\[\nL_{full} = \\frac{1}{M} \\sum_{j=1}^{M} L(N(x_j; W_1,...W_n), y_j)\n\\]\n\n\n\n7.2.3 Training Neural Networks with Gradient Descent\nNow that we can measure how well our network fits the training data, we can optimize the neural network weights to minimize this loss. At a high level, we tweak the parameters of the real-valued matrices \\(W_i\\)s to minimize the loss function \\(L_{full}\\). Overall, our mathematical objective is\n\nNeural Network Training Objective \\[\nmin_{W_1, ..., W_n} L_{full}\n\\] \\[\n= min_{W_1, ..., W_n} \\frac{1}{M} \\sum_{j=1}^{M} L(N(x_j; W_1,...W_n), y_j)\n\\]\n\nSo, how do we optimize this objective? Recall from calculus that minimizing a function can be done by taking the function’s derivative concerning the input parameters and tweaking the parameters in the gradient direction. This technique is called gradient descent and concretely involves calculating the derivative of the loss function \\(L_{full}\\) concerning \\(W_1, ..., W_n\\) to obtain a gradient for these parameters to take a step in, then updating these parameters in the direction of the gradient. Thus, we can train our neural network using gradient descent, which repeatedly applies the update rule.\n\nGradient Descent Update Rule \\[\nW_i := W_i - \\lambda \\frac{\\partial L_{full}}{\\partial W_i} \\mbox{ for } i=1..n\n\\]\n\n\n\n\n\n\n\nNote\n\n\n\nIn practice, the gradient is computed over a minibatch of data points to improve computational efficiency. This is called stochastic gradient descent or batch gradient descent.\n\n\nWhere \\(\\lambda\\) is the stepsize or learning rate of our tweaks, in training our neural network, we repeatedly perform the step above until convergence, or when the loss no longer decreases.@fig-gradient-descent illustrates this process: we want to reach the minimum point, which’s done by following the gradient (as illustrated with the blue arrows in the figure). This prior approach is known as full gradient descent since we are computing the derivative concerning the entire training data and only then taking a single gradient step; a more efficient approach is to calculate the gradient concerning just a random batch of data points and then taking a step, a process known as batch gradient descent or stochastic gradient descent (Robbins and Monro 1951), which is more efficient since now we are taking many more steps per pass of the entire training data. Next, we will cover the mathematics behind computing the gradient of the loss function concerning the \\(W_i\\)s, a process known as backpropagation.\n\nRobbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3): 400–407. https://doi.org/10.1214/aoms/1177729586.\n\n\n\n\n\n\nFigure 7.2: Gradient descent. Credit: Towards Data Science.\n\n\n\n\n\n7.2.4 Backpropagation\nTraining neural networks involve repeated applications of the gradient descent algorithm, which involves computing the derivative of the loss function with respect to the \\(W_i\\)s. How do we compute the loss derivative concerning the \\(W_i\\)s, given that the \\(W_i\\)s are nested functions of each other in a deep neural network? The trick is to leverage the chain rule: we can compute the derivative of the loss concerning the \\(W_i\\)s by repeatedly applying the chain rule in a complete process known as backpropagation. Specifically, we can calculate the gradients by computing the derivative of the loss concerning the outputs of the last layer, then progressively use this to compute the derivative of the loss concerning each prior layer to the input layer. This process starts from the end of the network (the layer closest to the output) and progresses backwards, and hence gets its name backpropagation.\nLet’s break this down. We can compute the derivative of the loss concerning the outputs of each layer of the neural network by using repeated applications of the chain rule.\n\\[\n\\frac{\\partial L_{full}}{\\partial L_{n}} = \\frac{\\partial A_{n}}{\\partial L_{n}} \\frac{\\partial L_{full}}{\\partial A_{n}}\n\\]\n\\[\n\\frac{\\partial L_{full}}{\\partial L_{n-1}} = \\frac{\\partial A_{n-1}}{\\partial L_{n-1}} \\frac{\\partial L_{n}}{\\partial A_{n-1}} \\frac{\\partial A_{n}}{\\partial L_{n}} \\frac{\\partial L_{full}}{\\partial A_{n}} \n\\]\nor more generally\n\\[\n\\frac{\\partial L_{full}}{\\partial L_{i}} = \\frac{\\partial A_{i}}{\\partial L_{i}} \\frac{\\partial L_{i+1}}{\\partial A_{i}} ... \\frac{\\partial A_{n}}{\\partial L_{n}} \\frac{\\partial L_{full}}{\\partial A_{n}} \n\\]\n\n\n\n\n\n\nNote\n\n\n\nIn what order should we perform this computation? From a computational perspective, performing the calculations from the end to the front is preferable. (i.e: first compute \\(\\frac{\\partial L_{full}}{\\partial A_{n}}\\) then the prior terms, rather than start in the middle) since this avoids materializing and computing large jacobians. This is because \\(\\ \\frac {\\partial L_{full}}{\\partial A_{n}}\\) is a vector; hence, any matrix operation that includes this term has an output that is squished to be a vector. Thus, performing the computation from the end avoids large matrix-matrix multiplications by ensuring that the intermediate products are vectors.\n\n\n\n\n\n\n\n\nNote\n\n\n\nIn our notation, we assume the intermediate activations \\(A_{i}\\) are column vectors, rather than row vectors, hence the chain rule is \\(\\frac{\\partial L}{\\partial L_{i}} = \\frac{\\partial L_{i+1}}{\\partial L_{i}} ... \\frac{\\partial L}{\\partial L_{n}}\\) rather than \\(\\frac{\\partial L}{\\partial L_{i}} = \\frac{\\partial L}{\\partial L_{n}} ... \\frac{\\partial L_{i+1}}{\\partial L_{i}}\\)\n\n\nAfter computing the derivative of the loss concerning the output of each layer, we can easily obtain the derivative of the loss concerning the parameters, again using the chain rule:\n\\[\n\\frac{\\partial L_{full}}{W_{i}} = \\frac{\\partial L_{i}}{\\partial W_{i}} \\frac{\\partial L_{full}}{\\partial L_{i}}\n\\]\nAnd this is ultimately how the derivatives of the layers’ weights are computed using backpropagation! What does this concretely look like in a specific example? Below, we walk through a specific example of a simple 2-layer neural network on a regression task using an MSE loss function with 100-dimensional inputs and a 30-dimensional hidden layer:\n\nExample of Backpropagation\nSuppose we have a two-layer neural network \\[\nL_1 = W_1 A_{0}\n\\] \\[\nA_1 = ReLU(L_1)\n\\] \\[\nL_2 = W_2 A_{1}\n\\] \\[\nA_2 = ReLU(L_2)\n\\] \\[\nNN(x) = \\mbox{Let } A_{0} = x \\mbox{ then output } A_2\n\\] where \\(W_1 \\in \\mathbb{R}^{30 \\times 100}\\) and \\(W_2 \\in \\mathbb{R}^{1 \\times 30}\\). Furthermore, suppose we use the MSE loss function: \\[\nL(x, y) = (x-y)^2\n\\] We wish to compute \\[\n\\frac{\\partial L(NN(x), y)}{\\partial W_i} \\mbox{ for } i=1,2\n\\] Note the following: \\[\n\\frac{\\partial L(x, y)}{\\partial x} = 2 \\times (x-y)\n\\] \\[\n\\frac{\\partial ReLU(x)}{\\partial x} \\delta = \\left\\{\\begin{array}{lr}\n0 & \\text{for } x \\leq 0 \\\\\n1 & \\text{for } x \\geq 0 \\\\\n\\end{array}\\right\\} \\odot \\delta\n\\] \\[\n\\frac{\\partial WA}{\\partial A} \\delta = W^T \\delta\n\\] \\[\n\\frac{\\partial WA}{\\partial W} \\delta = \\delta A^T\n\\] Then we have \\[\n\\frac{\\partial L(NN(x), y)}{\\partial W_2} = \\frac{\\partial L_2}{\\partial W_2} \\frac{\\partial A_2}{\\partial L_2} \\frac{\\partial L(NN(x), y)}{\\partial A_2}\n\\] \\[\n= (2L(NN(x) - y) \\odot ReLU'(L_2)) A_1^T\n\\] and \\[\n\\frac{\\partial L(NN(x), y)}{\\partial W_1} = \\frac{\\partial L_1}{\\partial W_1} \\frac{\\partial A_1}{\\partial L_1} \\frac{\\partial L_2}{\\partial A_1} \\frac{\\partial A_2}{\\partial L_2} \\frac{\\partial L(NN(x), y)}{\\partial A_2}\n\\] \\[\n= [ReLU'(L_1) \\odot (W_2^T [2L(NN(x) - y) \\odot ReLU'(L_2)])] A_0^T\n\\]\n\n\n\n\n\n\n\nTip\n\n\n\nDouble-check your work by making sure that the shapes are correct!\n\nAll Hadamard products (\\(\\odot\\)) should operate on tensors of the same shape\nAll matrix multiplications should operate on matrices that share a common dimension (i.e., m by n, n by k)\nAll gradients concerning the weights should have the same shape as the weight matrices themselves\n\n\n\nThe entire backpropagation process can be complex, especially for very deep networks. Fortunately, machine learning frameworks like PyTorch support automatic differentiation, which performs backpropagation for us. In these frameworks, we simply need to specify the forward pass, and the derivatives will be automatically computed for us. Nevertheless, it is beneficial to understand the theoretical process that is happening under the hood in these machine-learning frameworks.\n\n\n\n\n\n\nNote\n\n\n\nAs seen above, intermediate activations \\(A_i\\) are reused in backpropagation. To improve performance, these activations are cached from the forward pass to avoid being recomputed. However, activations must be kept in memory between the forward and backward passes, leading to higher memory usage. If the network and batch size are large, this may lead to memory issues. Similarly, the derivatives with respect to each layer’s outputs are cached to avoid recomputation.\n\n\n\n\n\n\n\n\nExercise 7.1: Neural Networks with Backpropagation and Gradient Descent\n\n\n\n\n\nUnlock the math behind powerful neural networks! Deep learning might seem like magic, but it’s rooted in mathematical principles. In this chapter, you’ve broken down neural network notation, loss functions, and the powerful technique of backpropagation. Now, prepare to implement this theory with these Colab notebooks. Dive into the heart of how neural networks learn. You’ll see the math behind backpropagation and gradient descent, updating those weights step-by-step.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -704,7 +704,7 @@ "section": "7.3 Differentiable Computation Graphs", "text": "7.3 Differentiable Computation Graphs\nIn general, stochastic gradient descent using backpropagation can be performed on any computational graph that a user may define, provided that the operations of the computation are differentiable. As such, generic deep learning libraries like PyTorch and Tensorflow allow users to specify their computational process (i.e., neural networks) as a computational graph. Backpropagation is automatically performed via automatic differentiation when stochastic gradient descent is performed on these computational graphs. Framing AI training as an optimization problem on differentiable computation graphs is a general way to understand what is happening under the hood with deep learning systems.\nThe structure depicted in Figure 7.3 showcases a segment of a differentiable computational graph. In this graph, the input ‘x’ is processed through a series of operations: it is first multiplied by a weight matrix ‘W’ (MatMul), then added to a bias ‘b’ (Add), and finally passed to an activation function, Rectified Linear Unit (ReLU). This sequence of operations gives us the output C. The graph’s differentiable nature means that each operation has a well-defined gradient. Automatic differentiation, as implemented in ML frameworks, leverages this property to efficiently compute the gradients of the loss with respect to each parameter in the network (e.g., ‘W’ and ‘b’).\n\n\n\n\n\n\nFigure 7.3: Computational Graph. Credit: TensorFlow.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -713,9 +713,9 @@ "href": "contents/training/training.html#training-data", "title": "7  AI Training", "section": "7.4 Training Data", - "text": "7.4 Training Data\nTo enable effective neural network training, the available data must be split into training, validation, and test sets. The training set is used to train the model parameters. The validation set evaluates the model during training to tune hyperparameters and prevent overfitting. The test set provides an unbiased final evaluation of the trained model’s performance.\nMaintaining clear splits between train, validation, and test sets with representative data is crucial to properly training, tuning, and evaluating models to achieve the best real-world performance. To this end, we will learn about the common pitfalls or mistakes people make when creating these data splits.\nTable 7.1 compares the differences between training, validation, and test data splits:\n\n\n\nTable 7.1: Comparing training, validation, and test data splits.\n\n\n\n\n\n\n\n\n\n\nData Split\nPurpose\nTypical Size\n\n\n\n\nTraining Set\nTrain the model parameters\n60-80% of total data\n\n\nValidation Set\nEvaluate model during training to tune hyperparameters and prevent overfitting\n∼20% of total data\n\n\nTest Set\nProvide unbiased evaluation of final trained model\n∼20% of total data\n\n\n\n\n\n\n\n7.4.1 Dataset Splits\n\nTraining Set\nThe training set is used to train the model. It is the largest subset, typically 60-80% of the total data. The model sees and learns from the training data to make predictions. A sufficiently large and representative training set is required for the model to learn the underlying patterns effectively.\n\n\nValidation Set\nThe validation set evaluates the model during training, usually after each epoch. Typically, 20% of the data is allocated for the validation set. The model does not learn or update its parameters based on the validation data. It is used to tune hyperparameters and make other tweaks to improve training. Monitoring metrics like loss and accuracy on the validation set prevents overfitting on just the training data.\n\n\nTest Set\nThe test set acts as a completely unseen dataset that the model did not see during training. It is used to provide an unbiased evaluation of the final trained model. Typically, 20% of the data is reserved for testing. Maintaining a hold-out test set is vital for obtaining an accurate estimate of how the trained model would perform on real-world unseen data. Data leakage from the test set must be avoided at all costs.\nThe relative proportions of the training, validation, and test sets can vary based on data size and application. However, following the general guidelines for a 60/20/20 split is a good starting point. Careful data splitting ensures models are properly trained, tuned, and evaluated to achieve the best performance.\nThe video below explains how to properly split the dataset into training, validation, and testing sets, ensuring an optimal training process.\n\n\n\n\n7.4.2 Common Pitfalls and Mistakes\n\nInsufficient Training Data\nAllocating too little data to the training set is a common mistake when splitting data that can severely impact model performance. If the training set is too small, the model will not have enough samples to effectively learn the true underlying patterns in the data. This leads to high variance and causes the model to fail to generalize well to new data.\nFor example, if you train an image classification model to recognize handwritten digits, providing only 10 or 20 images per digit class would be completely inadequate. The model would need more examples to capture the wide variances in writing styles, rotations, stroke widths, and other variations.\nAs a rule of thumb, the training set size should be at least hundreds or thousands of examples for most machine learning algorithms to work effectively. Due to the large number of parameters, the training set often needs to be in the tens or hundreds of thousands for deep neural networks, especially those using convolutional layers.\nInsufficient training data typically manifests in symptoms like high error rates on validation/test sets, low model accuracy, high variance, and overfitting on small training set samples. Collecting more quality training data is the solution. Data augmentation techniques can also help virtually increase the size of training data for images, audio, etc.\nCarefully factoring in the model complexity and problem difficulty when allocating training samples is important to ensure sufficient data is available for the model to learn successfully. Following guidelines on minimum training set sizes for different algorithms is also recommended. More training data is needed to maintain the overall success of any machine learning application.\nConsider Figure 7.4 where we try to classify/split datapoints into two categories (here, by color): On the left, overfitting is depicted by a model that has learned the nuances in the training data too well (either the dataset was too small or we ran the model for too long), causing it to follow the noise along with the signal, as indicated by the line’s excessive curves. The right side shows underfitting, where the model’s simplicity prevents it from capturing the dataset’s underlying structure, resulting in a line that does not fit the data well. The center graph represents an ideal fit, where the model balances well between generalization and fitting, capturing the main trend of the data without being swayed by outliers. Although the model is not a perfect fit (it misses some points), we care more about its ability to recognize general patterns rather than idiosyncratic outliers.\n\n\n\n\n\n\nFigure 7.4: Data fitting: overfitting, right fit, and underfitting. Credit: MathWorks.\n\n\n\nFigure 7.5 illustrates the process of fitting the data over time. When training, we search for the “sweet spot” between underfitting and overfitting. At first when the model hasn’t had enough time to learn the patterns in the data, we find ourselves in the underfitting zone, indicated by high error rates on the validation set (remember that the model is trained on the training set and we test its generalizability on the validation set, or data it hasn’t seen before). At some point, we achieve a global minimum for error rates, and ideally we want to stop the training there. If we continue training, the model will start “memorizing” or getting to know the data too well that the error rate starts going back up, since the model will fail to generalize to data it hasn’t seen before.\n\n\n\n\n\n\nFigure 7.5: Fitting the data overtime. Credit: IBM.\n\n\n\nThe video below provides an overview of bias and variance and the relationship between the two concepts and model accuracy.\n\n\n\nData Leakage Between Sets\nData leakage refers to the unintentional transfer of information between the training, validation, and test sets. This violates the fundamental assumption that the splits are completely separated. Data leakage leads to seriously compromised evaluation results and inflated performance metrics.\nA common way data leakage occurs is if some samples from the test set are inadvertently included in the training data. When evaluating the t-set, the model has already seen some of the data, which gives overly optimistic scores. For example, if 2% of the test data leaks into the training set of a binary classifier, it can result in an accuracy boost of up to 20%!\nIf the data splits are not done carefully, more subtle forms of leakage can happen. If the splits are not properly randomized and shuffled, samples close to each other in the dataset may end up across different splits. This creates information bleed through based on proximity in the dataset. Time series data is especially vulnerable unless special cross-validation techniques are used.\nPreventing data leakage requires creating solid separation between splits—no sample should exist in more than one split. Shuffling and randomized splitting help create robust divisions. Cross-validation techniques can be used for more rigorous evaluation. Detecting leakage is difficult, but telltale signs include models doing way better on test vs. validation data.\nData leakage severely compromises the validity of the evaluation because the model has already partially seen the test data. No amount of tuning or complex architectures can substitute for clean data splits. It is better to be conservative and create complete separation between splits to avoid this fundamental mistake in machine learning pipelines.\n\n\nSmall or Unrepresentative Validation Set\nThe validation set evaluates models during training and for hyperparameter tuning. It must be bigger and representative of the real data distribution to provide reliable and stable evaluations during training, making model selection and tuning more difficult.\nFor example, if the validation set only contains 100 samples, the metrics calculated will have a high variance. Due to noise, the accuracy may fluctuate up to 5-10% between epochs. This makes it difficult to know if a drop in validation accuracy is due to overfitting or natural variance. With a larger validation set, say 1000 samples, the metrics will be much more stable.\nAdditionally, if the validation set is not representative, perhaps missing certain subclasses, the estimated skill of the model may be inflated. This could lead to poor hyperparameter choices or premature training stops. Models selected based on such biased validation sets do not generalize well to real data.\nA good rule of thumb is that the validation set size should be at least several hundred samples and up to 10-20% of the training set. The splits should also be stratified, especially if working with imbalanced datasets. A larger validation set representing the original data characteristics is essential for proper model selection and tuning.\nThe validation set should also be manageable, leaving insufficient samples for training. Overall, the validation set is a critical piece of the data-splitting process, and care should be taken to avoid the pitfalls of small, inadequate samples that negatively impact model development.\n\n\nReusing the Test Set Multiple Times\nThe test set is designed to provide an unbiased evaluation of the fully trained model only once at the end of the model development process. Reusing the test set multiple times during development for model evaluation, hyperparameter tuning, model selection, etc., can result in overfitting on the test data.\nIf the test set is reused as part of the validation process, the model may start to see and learn from the test samples. This, coupled with intentionally or unintentionally optimizing model performance on the test set, can artificially inflate metrics like accuracy.\nFor example, suppose the test set is used repeatedly for model selection out of 5 architectures. In that case, the model may achieve 99% test accuracy by memorizing the samples rather than learning generalizable patterns. However, when deployed in the real world, the accuracy of new data could drop by 60%.\nThe best practice is to interact with the test set only once at the end to report unbiased metrics on how the final tuned model would perform in the real world. While developing the model, the validation set should be used for all parameter tuning, model selection, early stopping, etc.\nMaintaining the complete separation of training/validation from the test set is essential to obtain accurate estimates of model performance. Even minor deviations from a single use of the test set could positively bias results and metrics, providing an overly optimistic view of real-world efficacy.\n\n\nSame Data Splits Across Experiments\nWhen comparing different machine learning models or experimenting with various architectures and hyperparameters, using the same data splits for training, validation, and testing across the different experiments can introduce bias and invalidate the comparisons.\nIf the same splits are reused, the evaluation results may be more balanced and accurately measure which model performs better. For example, a certain random data split may favor model A over model B irrespective of the algorithms. Reusing this split will then bias towards model A.\nInstead, the data splits should be randomized or shuffled for each experimental iteration. This ensures that randomness in the sampling of the splits does not confer an unfair advantage to any model.\nWith different splits per experiment, the evaluation becomes more robust. Each model is tested on a wide range of test sets drawn randomly from the overall population, smoothing out variation and removing correlation between results.\nProper practice is to set a random seed before splitting the data for each experiment. Splitting should occur after shuffling/resampling as part of the experimental pipeline. Carrying out comparisons on the same splits violates the i.i.d (independent and identically distributed) assumption required for statistical validity.\nUnique splits are essential for fair model comparisons. Though more compute-intensive, randomized allocation per experiment removes sampling bias and enables valid benchmarking. This highlights the true differences in model performance irrespective of a particular split’s characteristics.\n\n\nInformation Leakage Between Sets\nInformation leakage between the training, validation, and test sets occurs when information from one set inadvertently bleeds into another. This could happen due to flaws in the data-splitting process, which violates the assumption that the sets are mutually exclusive.\nFor example, consider a dataset sorted chronologically. If a simple random split is performed, samples close to each other in the dataset may end up in different splits. Models could then learn from ‘future’ data if test samples are leaked into the training set.\nSimilarly, distribution biases may persist across sets if the splits are not properly shuffled. The training set may contain more than certain outliers in the test set, compromising generalization. Issues like class imbalance may also get amplified if splitting is not stratified.\nAnother case is when datasets have linked, inherently connected samples, such as graphs, networks, or time series data. Naive splitting may isolate connected nodes or time steps into different sets. Models can make invalid assumptions based on partial information.\nPreventing information leakage requires awareness of the dataset’s structure and relationships between samples. Shuffling, stratification, and grouped splitting of related samples can help mitigate leakage. Proper cross-validation procedures should be followed, mindful of temporal or sample proximity.\nSubtle leakage of information between sets undermines model evaluation and training. It creates misleading results on model effectiveness. Data splitting procedures should account for sample relationships and distribution differences to ensure mutual exclusivity between sets.\n\n\nFailing to Stratify Splits\nWhen splitting data into training, validation, and test sets, failing to stratify the splits can result in an uneven representation of the target classes across the splits and introduce sampling bias. This is especially problematic for imbalanced datasets.\nStratified splitting involves sampling data points such that the proportion of output classes is approximately preserved in each split. For example, if performing a 70/30 train-test split on a dataset with 60% negative and 40% positive samples, stratification ensures ~60% negative and ~40% positive examples in both training and test sets.\nWithout stratification, random chance could result in the training split having 70% positive samples while the test has 30% positive samples. The model trained on this skewed training distribution will not generalize well. Class imbalance also compromises model metrics like accuracy.\nStratification works best when done using labels, though proxies like clustering can be used for unsupervised learning. It becomes essential for highly skewed datasets with rare classes that could easily be omitted from splits.\nLibraries like Scikit-Learn have stratified splitting methods built into them. Failing to use them could inadvertently introduce sampling bias and hurt model performance on minority groups. After performing the splits, the overall class balance should be examined to ensure even representation across the splits.\nStratification provides a balanced dataset for both model training and evaluation. Though simple random splitting is easy, mindful of stratification needs, especially for real-world imbalanced data, results in more robust model development and evaluation.\n\n\nIgnoring Time Series Dependencies\nTime series data has an inherent temporal structure with observations depending on past context. Naively splitting time series data into train and test sets without accounting for this dependency leads to data leakage and lookahead bias.\nFor example, simply splitting a time series into the first 70% of training and the last 30% as test data will contaminate the training data with future data points. The model can use this information to “peek” ahead during training.\nThis results in an overly optimistic evaluation of the model’s performance. The model may appear to forecast the future accurately but has actually implicitly learned based on future data, which does not translate to real-world performance.\nProper time series cross-validation techniques, such as forward chaining, should be used to preserve order and dependency. The test set should only contain data points from a future time window that the model was not exposed to for training.\nFailing to account for temporal relationships leads to invalid causality assumptions. If the training data contains future points, the model may also need to learn how to extrapolate forecasts further.\nMaintaining the temporal flow of events and avoiding lookahead bias is key to properly training and testing time series models. This ensures they can truly predict future patterns and not just memorize past training data.\n\n\nNo Unseen Data for Final Evaluation\nA common mistake when splitting data is failing to set aside some portion of the data just for the final evaluation of the completed model. All of the data is used for training, validation, and test sets during development.\nThis leaves no unseen data to get an unbiased estimate of how the final tuned model would perform in the real world. The metrics on the test set used during development may only partially reflect actual model skills.\nFor example, choices like early stopping and hyperparameter tuning are often optimized based on test set performance. This couples the model to the test data. An unseen dataset is needed to break this coupling and get true real-world metrics.\nBest practice is to reserve a portion, such as 20-30% of the full dataset, solely for final model evaluation. This data should not be used for validation, tuning, or model selection during development.\nSaving some unseen data allows for evaluating the completely trained model as a black box on real-world data. This provides reliable metrics to decide whether the model is ready for production deployment.\nFailing to keep an unseen hold-out set for final validation risks optimizing results and overlooking potential failures before model release. Having some fresh data provides a final sanity check on real-world efficacy.\n\n\nOveroptimizing on the Validation Set\nThe validation set is meant to guide the model training process, not serve as additional training data. Overoptimizing the validation set to maximize performance metrics treats it more like a secondary training set, leading to inflated metrics and poor generalization.\nFor example, techniques like extensively tuning hyperparameters or adding data augmentations targeted to boost validation accuracy can cause the model to fit too closely to the validation data. The model may achieve 99% validation accuracy but only 55% test accuracy.\nSimilarly, reusing the validation set for early stopping can also optimize the model specifically for that data. Stopping at the best validation performance overfits noise and fluctuations caused by the small validation size.\nThe validation set serves as a proxy to tune and select models. However, the goal remains maximizing real-world data performance, not the validation set. Minimizing the loss or error on validation data does not automatically translate to good generalization.\nA good approach is to keep the use of the validation set minimal—hyperparameters can be tuned coarsely first on training data, for example. The validation set guides the training but should not influence or alter the model itself. It is a diagnostic, not an optimization tool.\nWhen assessing performance on the validation set, care should be taken not to overfit. Tradeoffs are needed to build models that perform well on the overall population and are not overly tuned to the validation samples.", + "text": "7.4 Training Data\nTo enable effective neural network training, the available data must be split into training, validation, and test sets. The training set is used to train the model parameters. The validation set evaluates the model during training to tune hyperparameters and prevent overfitting. The test set provides an unbiased final evaluation of the trained model’s performance.\nMaintaining clear splits between train, validation, and test sets with representative data is crucial to properly training, tuning, and evaluating models to achieve the best real-world performance. To this end, we will learn about the common pitfalls or mistakes people make when creating these data splits.\nTable 7.1 compares the differences between training, validation, and test data splits:\n\n\n\nTable 7.1: Comparing training, validation, and test data splits.\n\n\n\n\n\n\n\n\n\n\nData Split\nPurpose\nTypical Size\n\n\n\n\nTraining Set\nTrain the model parameters\n60-80% of total data\n\n\nValidation Set\nEvaluate model during training to tune hyperparameters and prevent overfitting\n∼20% of total data\n\n\nTest Set\nProvide unbiased evaluation of final trained model\n∼20% of total data\n\n\n\n\n\n\n\n7.4.1 Dataset Splits\n\nTraining Set\nThe training set is used to train the model. It is the largest subset, typically 60-80% of the total data. The model sees and learns from the training data to make predictions. A sufficiently large and representative training set is required for the model to learn the underlying patterns effectively.\n\n\nValidation Set\nThe validation set evaluates the model during training, usually after each epoch. Typically, 20% of the data is allocated for the validation set. The model does not learn or update its parameters based on the validation data. It is used to tune hyperparameters and make other tweaks to improve training. Monitoring metrics like loss and accuracy on the validation set prevents overfitting on just the training data.\n\n\nTest Set\nThe test set acts as a completely unseen dataset that the model did not see during training. It is used to provide an unbiased evaluation of the final trained model. Typically, 20% of the data is reserved for testing. Maintaining a hold-out test set is vital for obtaining an accurate estimate of how the trained model would perform on real-world unseen data. Data leakage from the test set must be avoided at all costs.\nThe relative proportions of the training, validation, and test sets can vary based on data size and application. However, following the general guidelines for a 60/20/20 split is a good starting point. Careful data splitting ensures models are properly trained, tuned, and evaluated to achieve the best performance.\nVideo 7.1 explains how to properly split the dataset into training, validation, and testing sets, ensuring an optimal training process.\n\n\n\n\n\n\nVideo 7.1: Train/Dev/Test Sets\n\n\n\n\n\n\n\n\n\n7.4.2 Common Pitfalls and Mistakes\n\nInsufficient Training Data\nAllocating too little data to the training set is a common mistake when splitting data that can severely impact model performance. If the training set is too small, the model will not have enough samples to effectively learn the true underlying patterns in the data. This leads to high variance and causes the model to fail to generalize well to new data.\nFor example, if you train an image classification model to recognize handwritten digits, providing only 10 or 20 images per digit class would be completely inadequate. The model would need more examples to capture the wide variances in writing styles, rotations, stroke widths, and other variations.\nAs a rule of thumb, the training set size should be at least hundreds or thousands of examples for most machine learning algorithms to work effectively. Due to the large number of parameters, the training set often needs to be in the tens or hundreds of thousands for deep neural networks, especially those using convolutional layers.\nInsufficient training data typically manifests in symptoms like high error rates on validation/test sets, low model accuracy, high variance, and overfitting on small training set samples. Collecting more quality training data is the solution. Data augmentation techniques can also help virtually increase the size of training data for images, audio, etc.\nCarefully factoring in the model complexity and problem difficulty when allocating training samples is important to ensure sufficient data is available for the model to learn successfully. Following guidelines on minimum training set sizes for different algorithms is also recommended. More training data is needed to maintain the overall success of any machine learning application.\nConsider Figure 7.4 where we try to classify/split datapoints into two categories (here, by color): On the left, overfitting is depicted by a model that has learned the nuances in the training data too well (either the dataset was too small or we ran the model for too long), causing it to follow the noise along with the signal, as indicated by the line’s excessive curves. The right side shows underfitting, where the model’s simplicity prevents it from capturing the dataset’s underlying structure, resulting in a line that does not fit the data well. The center graph represents an ideal fit, where the model balances well between generalization and fitting, capturing the main trend of the data without being swayed by outliers. Although the model is not a perfect fit (it misses some points), we care more about its ability to recognize general patterns rather than idiosyncratic outliers.\n\n\n\n\n\n\nFigure 7.4: Data fitting: overfitting, right fit, and underfitting. Credit: MathWorks.\n\n\n\nFigure 7.5 illustrates the process of fitting the data over time. When training, we search for the “sweet spot” between underfitting and overfitting. At first when the model hasn’t had enough time to learn the patterns in the data, we find ourselves in the underfitting zone, indicated by high error rates on the validation set (remember that the model is trained on the training set and we test its generalizability on the validation set, or data it hasn’t seen before). At some point, we achieve a global minimum for error rates, and ideally we want to stop the training there. If we continue training, the model will start “memorizing” or getting to know the data too well that the error rate starts going back up, since the model will fail to generalize to data it hasn’t seen before.\n\n\n\n\n\n\nFigure 7.5: Fitting the data overtime. Credit: IBM.\n\n\n\nVideo 7.2 provides an overview of bias and variance and the relationship between the two concepts and model accuracy.\n\n\n\n\n\n\nVideo 7.2: Bias/Variance\n\n\n\n\n\n\n\n\nData Leakage Between Sets\nData leakage refers to the unintentional transfer of information between the training, validation, and test sets. This violates the fundamental assumption that the splits are completely separated. Data leakage leads to seriously compromised evaluation results and inflated performance metrics.\nA common way data leakage occurs is if some samples from the test set are inadvertently included in the training data. When evaluating the t-set, the model has already seen some of the data, which gives overly optimistic scores. For example, if 2% of the test data leaks into the training set of a binary classifier, it can result in an accuracy boost of up to 20%!\nIf the data splits are not done carefully, more subtle forms of leakage can happen. If the splits are not properly randomized and shuffled, samples close to each other in the dataset may end up across different splits. This creates information bleed through based on proximity in the dataset. Time series data is especially vulnerable unless special cross-validation techniques are used.\nPreventing data leakage requires creating solid separation between splits—no sample should exist in more than one split. Shuffling and randomized splitting help create robust divisions. Cross-validation techniques can be used for more rigorous evaluation. Detecting leakage is difficult, but telltale signs include models doing way better on test vs. validation data.\nData leakage severely compromises the validity of the evaluation because the model has already partially seen the test data. No amount of tuning or complex architectures can substitute for clean data splits. It is better to be conservative and create complete separation between splits to avoid this fundamental mistake in machine learning pipelines.\n\n\nSmall or Unrepresentative Validation Set\nThe validation set evaluates models during training and for hyperparameter tuning. It must be bigger and representative of the real data distribution to provide reliable and stable evaluations during training, making model selection and tuning more difficult.\nFor example, if the validation set only contains 100 samples, the metrics calculated will have a high variance. Due to noise, the accuracy may fluctuate up to 5-10% between epochs. This makes it difficult to know if a drop in validation accuracy is due to overfitting or natural variance. With a larger validation set, say 1000 samples, the metrics will be much more stable.\nAdditionally, if the validation set is not representative, perhaps missing certain subclasses, the estimated skill of the model may be inflated. This could lead to poor hyperparameter choices or premature training stops. Models selected based on such biased validation sets do not generalize well to real data.\nA good rule of thumb is that the validation set size should be at least several hundred samples and up to 10-20% of the training set. The splits should also be stratified, especially if working with imbalanced datasets. A larger validation set representing the original data characteristics is essential for proper model selection and tuning.\nThe validation set should also be manageable, leaving insufficient samples for training. Overall, the validation set is a critical piece of the data-splitting process, and care should be taken to avoid the pitfalls of small, inadequate samples that negatively impact model development.\n\n\nReusing the Test Set Multiple Times\nThe test set is designed to provide an unbiased evaluation of the fully trained model only once at the end of the model development process. Reusing the test set multiple times during development for model evaluation, hyperparameter tuning, model selection, etc., can result in overfitting on the test data.\nIf the test set is reused as part of the validation process, the model may start to see and learn from the test samples. This, coupled with intentionally or unintentionally optimizing model performance on the test set, can artificially inflate metrics like accuracy.\nFor example, suppose the test set is used repeatedly for model selection out of 5 architectures. In that case, the model may achieve 99% test accuracy by memorizing the samples rather than learning generalizable patterns. However, when deployed in the real world, the accuracy of new data could drop by 60%.\nThe best practice is to interact with the test set only once at the end to report unbiased metrics on how the final tuned model would perform in the real world. While developing the model, the validation set should be used for all parameter tuning, model selection, early stopping, etc.\nMaintaining the complete separation of training/validation from the test set is essential to obtain accurate estimates of model performance. Even minor deviations from a single use of the test set could positively bias results and metrics, providing an overly optimistic view of real-world efficacy.\n\n\nSame Data Splits Across Experiments\nWhen comparing different machine learning models or experimenting with various architectures and hyperparameters, using the same data splits for training, validation, and testing across the different experiments can introduce bias and invalidate the comparisons.\nIf the same splits are reused, the evaluation results may be more balanced and accurately measure which model performs better. For example, a certain random data split may favor model A over model B irrespective of the algorithms. Reusing this split will then bias towards model A.\nInstead, the data splits should be randomized or shuffled for each experimental iteration. This ensures that randomness in the sampling of the splits does not confer an unfair advantage to any model.\nWith different splits per experiment, the evaluation becomes more robust. Each model is tested on a wide range of test sets drawn randomly from the overall population, smoothing out variation and removing correlation between results.\nProper practice is to set a random seed before splitting the data for each experiment. Splitting should occur after shuffling/resampling as part of the experimental pipeline. Carrying out comparisons on the same splits violates the i.i.d (independent and identically distributed) assumption required for statistical validity.\nUnique splits are essential for fair model comparisons. Though more compute-intensive, randomized allocation per experiment removes sampling bias and enables valid benchmarking. This highlights the true differences in model performance irrespective of a particular split’s characteristics.\n\n\nInformation Leakage Between Sets\nInformation leakage between the training, validation, and test sets occurs when information from one set inadvertently bleeds into another. This could happen due to flaws in the data-splitting process, which violates the assumption that the sets are mutually exclusive.\nFor example, consider a dataset sorted chronologically. If a simple random split is performed, samples close to each other in the dataset may end up in different splits. Models could then learn from ‘future’ data if test samples are leaked into the training set.\nSimilarly, distribution biases may persist across sets if the splits are not properly shuffled. The training set may contain more than certain outliers in the test set, compromising generalization. Issues like class imbalance may also get amplified if splitting is not stratified.\nAnother case is when datasets have linked, inherently connected samples, such as graphs, networks, or time series data. Naive splitting may isolate connected nodes or time steps into different sets. Models can make invalid assumptions based on partial information.\nPreventing information leakage requires awareness of the dataset’s structure and relationships between samples. Shuffling, stratification, and grouped splitting of related samples can help mitigate leakage. Proper cross-validation procedures should be followed, mindful of temporal or sample proximity.\nSubtle leakage of information between sets undermines model evaluation and training. It creates misleading results on model effectiveness. Data splitting procedures should account for sample relationships and distribution differences to ensure mutual exclusivity between sets.\n\n\nFailing to Stratify Splits\nWhen splitting data into training, validation, and test sets, failing to stratify the splits can result in an uneven representation of the target classes across the splits and introduce sampling bias. This is especially problematic for imbalanced datasets.\nStratified splitting involves sampling data points such that the proportion of output classes is approximately preserved in each split. For example, if performing a 70/30 train-test split on a dataset with 60% negative and 40% positive samples, stratification ensures ~60% negative and ~40% positive examples in both training and test sets.\nWithout stratification, random chance could result in the training split having 70% positive samples while the test has 30% positive samples. The model trained on this skewed training distribution will not generalize well. Class imbalance also compromises model metrics like accuracy.\nStratification works best when done using labels, though proxies like clustering can be used for unsupervised learning. It becomes essential for highly skewed datasets with rare classes that could easily be omitted from splits.\nLibraries like Scikit-Learn have stratified splitting methods built into them. Failing to use them could inadvertently introduce sampling bias and hurt model performance on minority groups. After performing the splits, the overall class balance should be examined to ensure even representation across the splits.\nStratification provides a balanced dataset for both model training and evaluation. Though simple random splitting is easy, mindful of stratification needs, especially for real-world imbalanced data, results in more robust model development and evaluation.\n\n\nIgnoring Time Series Dependencies\nTime series data has an inherent temporal structure with observations depending on past context. Naively splitting time series data into train and test sets without accounting for this dependency leads to data leakage and lookahead bias.\nFor example, simply splitting a time series into the first 70% of training and the last 30% as test data will contaminate the training data with future data points. The model can use this information to “peek” ahead during training.\nThis results in an overly optimistic evaluation of the model’s performance. The model may appear to forecast the future accurately but has actually implicitly learned based on future data, which does not translate to real-world performance.\nProper time series cross-validation techniques, such as forward chaining, should be used to preserve order and dependency. The test set should only contain data points from a future time window that the model was not exposed to for training.\nFailing to account for temporal relationships leads to invalid causality assumptions. If the training data contains future points, the model may also need to learn how to extrapolate forecasts further.\nMaintaining the temporal flow of events and avoiding lookahead bias is key to properly training and testing time series models. This ensures they can truly predict future patterns and not just memorize past training data.\n\n\nNo Unseen Data for Final Evaluation\nA common mistake when splitting data is failing to set aside some portion of the data just for the final evaluation of the completed model. All of the data is used for training, validation, and test sets during development.\nThis leaves no unseen data to get an unbiased estimate of how the final tuned model would perform in the real world. The metrics on the test set used during development may only partially reflect actual model skills.\nFor example, choices like early stopping and hyperparameter tuning are often optimized based on test set performance. This couples the model to the test data. An unseen dataset is needed to break this coupling and get true real-world metrics.\nBest practice is to reserve a portion, such as 20-30% of the full dataset, solely for final model evaluation. This data should not be used for validation, tuning, or model selection during development.\nSaving some unseen data allows for evaluating the completely trained model as a black box on real-world data. This provides reliable metrics to decide whether the model is ready for production deployment.\nFailing to keep an unseen hold-out set for final validation risks optimizing results and overlooking potential failures before model release. Having some fresh data provides a final sanity check on real-world efficacy.\n\n\nOveroptimizing on the Validation Set\nThe validation set is meant to guide the model training process, not serve as additional training data. Overoptimizing the validation set to maximize performance metrics treats it more like a secondary training set, leading to inflated metrics and poor generalization.\nFor example, techniques like extensively tuning hyperparameters or adding data augmentations targeted to boost validation accuracy can cause the model to fit too closely to the validation data. The model may achieve 99% validation accuracy but only 55% test accuracy.\nSimilarly, reusing the validation set for early stopping can also optimize the model specifically for that data. Stopping at the best validation performance overfits noise and fluctuations caused by the small validation size.\nThe validation set serves as a proxy to tune and select models. However, the goal remains maximizing real-world data performance, not the validation set. Minimizing the loss or error on validation data does not automatically translate to good generalization.\nA good approach is to keep the use of the validation set minimal—hyperparameters can be tuned coarsely first on training data, for example. The validation set guides the training but should not influence or alter the model itself. It is a diagnostic, not an optimization tool.\nWhen assessing performance on the validation set, care should be taken not to overfit. Tradeoffs are needed to build models that perform well on the overall population and are not overly tuned to the validation samples.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -726,7 +726,7 @@ "section": "7.5 Optimization Algorithms", "text": "7.5 Optimization Algorithms\nStochastic gradient descent (SGD) is a simple yet powerful optimization algorithm for training machine learning models. It works by estimating the gradient of the loss function concerning the model parameters using a single training example and then updating the parameters in the direction that reduces the loss.\nWhile conceptually straightforward, SGD needs a few areas for improvement. First, choosing a proper learning rate can be difficult—too small, and progress is very slow; too large, and parameters may oscillate and fail to converge. Second, SGD treats all parameters equally and independently, which may not be ideal in all cases. Finally, vanilla SGD uses only first-order gradient information, which results in slow progress on ill-conditioned problems.\n\n7.5.1 Optimizations\nOver the years, various optimizations have been proposed to accelerate and improve vanilla SGD. Ruder (2016) gives an excellent overview of the different optimizers. Briefly, several commonly used SGD optimization techniques include:\n\nRuder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.” ArXiv Preprint abs/1609.04747. https://arxiv.org/abs/1609.04747.\nMomentum: Accumulates a velocity vector in directions of persistent gradient across iterations. This helps accelerate progress by dampening oscillations and maintains progress in consistent directions.\nNesterov Accelerated Gradient (NAG): A variant of momentum that computes gradients at the “look ahead” rather than the current parameter position. This anticipatory update prevents overshooting while the momentum maintains the accelerated progress.\nRMSProp: Divides the learning rate by an exponentially decaying average of squared gradients. This has a similar normalizing effect as Adagrad but does not accumulate the gradients over time, avoiding a rapid decay of learning rates (Hinton 2017).\n\nHinton, Geoffrey. 2017. “Overview of Minibatch Gradient Descent.” University of Toronto; University Lecture.\n\nDuchi, John C., Elad Hazan, and Yoram Singer. 2010. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” In COLT 2010 - the 23rd Conference on Learning Theory, Haifa, Israel, June 27-29, 2010, edited by Adam Tauman Kalai and Mehryar Mohri, 257–69. Omnipress. http://colt2010.haifa.il.ibm.com/papers/COLT2010proceedings.pdf\\#page=265.\nAdagrad: An adaptive learning rate algorithm that maintains a per-parameter learning rate scaled down proportionate to each parameter’s historical sum of gradients. This helps eliminate the need to tune learning rates (Duchi, Hazan, and Singer 2010) manually.\nAdadelta: A modification to Adagrad restricts the window of accumulated past gradients, thus reducing the aggressive decay of learning rates (Zeiler 2012).\n\nZeiler, Matthew D. 2012. “Reinforcement and Systemic Machine Learning for Decision Making.” Wiley. https://doi.org/10.1002/9781118266502.ch6.\n\nKingma, Diederik P., and Jimmy Ba. 2015. “Adam: A Method for Stochastic Optimization.” In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, edited by Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1412.6980.\nAdam: - Combination of momentum and rmsprop where rmsprop modifies the learning rate based on the average of recent magnitudes of gradients. Displays very fast initial progress and automatically tunes step sizes (Kingma and Ba 2015).\nOf these methods, Adam has widely considered the go-to optimization algorithm for many deep-learning tasks. It consistently outperforms vanilla SGD in terms of training speed and performance. Other optimizers may be better suited in some cases, particularly for simpler models.\n\n\n7.5.2 Tradeoffs\nHere is a pros and cons table for some of the main optimization algorithms for neural network training:\n\n\n\n\n\n\n\n\nAlgorithm\nPros\nCons\n\n\n\n\nMomentum\nFaster convergence due to acceleration along gradients Less oscillation than vanilla SGD\nRequires tuning of momentum parameter\n\n\nNesterov Accelerated Gradient (NAG)\nFaster than standard momentum in some cases Anticipatory updates prevent overshooting\nMore complex to understand intuitively\n\n\nAdagrad\nEliminates need to tune learning rates manually Performs well on sparse gradients\nLearning rate may decay too quickly on dense gradients\n\n\nAdadelta\nLess aggressive learning rate decay than Adagrad\nStill sensitive to initial learning rate value\n\n\nRMSProp\nAutomatically adjusts learning rates Works well in practice\nNo major downsides\n\n\nAdam\nCombination of momentum and adaptive learning rates Efficient and fast convergence\nSlightly worse generalization performance in some cases\n\n\nAMSGrad\nImprovement to Adam addressing generalization issue\nNot as extensively used/tested as Adam\n\n\n\n\n\n7.5.3 Benchmarking Algorithms\nNo single method is best for all problem types. This means we need comprehensive benchmarking to identify the most effective optimizer for specific datasets and models. The performance of algorithms like Adam, RMSProp, and Momentum varies due to batch size, learning rate schedules, model architecture, data distribution, and regularization. These variations underline the importance of evaluating each optimizer under diverse conditions.\nTake Adam, for example, who often excels in computer vision tasks, unlike RMSProp, who may show better generalization in certain natural language processing tasks. Momentum’s strength lies in its acceleration in scenarios with consistent gradient directions, whereas Adagrad’s adaptive learning rates are more suited for sparse gradient problems.\nThis wide array of interactions among optimizers demonstrates the challenge of declaring a single, universally superior algorithm. Each optimizer has unique strengths, making it crucial to evaluate various methods to discover their optimal application conditions empirically.\nA comprehensive benchmarking approach should assess the speed of convergence and factors like generalization error, stability, hyperparameter sensitivity, and computational efficiency, among others. This entails monitoring training and validation learning curves across multiple runs and comparing optimizers on various datasets and models to understand their strengths and weaknesses.\nAlgoPerf, introduced by Dahl et al. (2021), addresses the need for a robust benchmarking system. This platform evaluates optimizer performance using criteria such as training loss curves, generalization error, sensitivity to hyperparameters, and computational efficiency. AlgoPerf tests various optimization methods, including Adam, LAMB, and Adafactor, across different model types like CNNs and RNNs/LSTMs on established datasets. It utilizes containerization and automatic metric collection to minimize inconsistencies and allows for controlled experiments across thousands of configurations, providing a reliable basis for comparing optimizers.\n\nDahl, George E, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, et al. 2021. “CSF Findings in Acute NMDAR and LGI1 AntibodyAssociated Autoimmune Encephalitis.” Neurology Neuroimmunology &Amp; Neuroinflammation 8 (6). https://doi.org/10.1212/nxi.0000000000001086.\nThe insights gained from AlgoPerf and similar benchmarks are invaluable for guiding optimizers’ optimal choice or tuning. By enabling reproducible evaluations, these benchmarks contribute to a deeper understanding of each optimizer’s performance, paving the way for future innovations and accelerated progress in the field.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -735,9 +735,9 @@ "href": "contents/training/training.html#hyperparameter-tuning", "title": "7  AI Training", "section": "7.6 Hyperparameter Tuning", - "text": "7.6 Hyperparameter Tuning\nHyperparameters are important settings in machine learning models that greatly impact how well your models ultimately perform. Unlike other model parameters that are learned during training, hyperparameters are specified by the data scientists or machine learning engineers before training the model.\nChoosing the right hyperparameter values enables your models to learn patterns from data effectively. Some examples of key hyperparameters across ML algorithms include:\n\nNeural networks: Learning rate, batch size, number of hidden units, activation functions\nSupport vector machines: Regularization strength, kernel type and parameters\nRandom forests: Number of trees, tree depth\nK-means: Number of clusters\n\nThe problem is that there are no reliable rules of thumb for choosing optimal hyperparameter configurations—you typically have to try out different values and evaluate performance. This process is called hyperparameter tuning.\nIn the early years of modern deep learning, researchers were still grappling with unstable and slow convergence issues. Common pain points included training losses fluctuating wildly, gradients exploding or vanishing, and extensive trial-and-error needed to train networks reliably. As a result, an early focal point was using hyperparameters to control model optimization. For instance, seminal techniques like batch normalization allowed faster model convergence by tuning aspects of internal covariate shift. Adaptive learning rate methods also mitigated the need for extensive manual schedules. These addressed optimization issues during training, such as uncontrolled gradient divergence. Carefully adapted learning rates are also the primary control factor for achieving rapid and stable convergence even today.\nAs computational capacity expanded exponentially in subsequent years, much larger models could be trained without falling prey to pure numerical optimization issues. The focus shifted towards generalization - though efficient convergence was a core prerequisite. State-of-the-art techniques like Transformers brought in parameters in billions. At such sizes, hyperparameters around capacity, regularization, ensembling, etc., took center stage for tuning rather than only raw convergence metrics.\nThe lesson is that understanding the acceleration and stability of the optimization process itself constitutes the groundwork. Initialization schemes, batch sizes, weight decays, and other training hyperparameters remain indispensable today. Mastering fast and flawless convergence allows practitioners to expand their focus on emerging needs around tuning for metrics like accuracy, robustness, and efficiency at scale.\n\n7.6.1 Search Algorithms\nWhen it comes to the critical process of hyperparameter tuning, there are several sophisticated algorithms that machine learning practitioners rely on to search through the vast space of possible model configurations systematically. Some of the most prominent hyperparameter search algorithms include:\n\nGrid Search: The most basic search method, where you manually define a grid of values to check for each hyperparameter. For example, checking learning rates = [0.01, 0.1, 1] and batch sizes = [32, 64, 128]. The key advantage is simplicity, but exploring all combinations leads to exponential search space explosion. Best for fine-tuning a few parameters.\nRandom Search: Instead of a grid, you define a random distribution per hyperparameter to sample values from during the search. This method is more efficient at searching a vast hyperparameter space. However, it is still somewhat arbitrary compared to more adaptive methods.\nBayesian Optimization: This is an advanced probabilistic approach for adaptive exploration based on a surrogate function to model performance over iterations. It is simple and efficient—it finds highly optimized hyperparameters in fewer evaluation steps. However, it requires more investment in setup (Snoek, Larochelle, and Adams 2012).\nEvolutionary Algorithms: These algorithms Mimic natural selection principles. They generate populations of hyperparameter combinations and evolve them over time-based on performance. These algorithms offer robust search capabilities better suited for complex response surfaces. However, many iterations are required for reasonable convergence.\nNeural Architecture Search: An approach to designing well-performing architectures for neural networks. Traditionally, NAS approaches use some form of reinforcement learning to propose neural network architectures, which are then repeatedly evaluated (Zoph and Le 2023).\n\n\nSnoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, 2960–68. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.\n\nZoph, Barret, and Quoc V. Le. 2023. “Cybernetical Intelligence.” Wiley. https://doi.org/10.1002/9781394217519.ch17.\n\n\n7.6.2 System Implications\nHyperparameter tuning can significantly impact time to convergence during model training, directly affecting overall runtime. The right values for key training hyperparameters are crucial for efficient model convergence. For example, the hyperparameter’s learning rate controls the step size during gradient descent optimization. Setting a properly tuned learning rate schedule ensures the optimization algorithm converges quickly towards a good minimum. Too small a learning rate leads to painfully slow convergence, while too large a value causes the losses to fluctuate wildly. Proper tuning ensures rapid movement towards optimal weights and biases.\nSimilarly, the batch size for stochastic gradient descent impacts convergence stability. The right batch size smooths out fluctuations in parameter updates to approach the minimum faster. More batch sizes are needed to avoid noisy convergence, while large batch sizes fail to generalize and slow down convergence due to less frequent parameter updates. Tuning hyperparameters for faster convergence and reduced training duration has direct implications on cost and resource requirements for scaling machine learning systems:\n\nLower computational costs: Shorter time to convergence means lower computational costs for training models. ML training often leverages large cloud computing instances like GPU and TPU clusters that incur heavy hourly charges. Minimizing training time directly reduces this resource rental cost, which tends to dominate ML budgets for organizations. Quicker iteration also lets data scientists experiment more freely within the same budget.\nReduced training time: Reduced training time unlocks opportunities to train more models using the same computational budget. Optimized hyperparameters stretch available resources further, allowing businesses to develop and experiment with more models under resource constraints to maximize performance.\nResource efficiency: Quicker training allows allocating smaller compute instances in the cloud since models require access to the resources for a shorter duration. For example, a one-hour training job allows using less powerful GPU instances compared to multi-hour training, which requires sustained compute access over longer intervals. This achieves cost savings, especially for large workloads.\n\nThere are other benefits as well. For instance, faster convergence reduces pressure on ML engineering teams regarding provisioning training resources. Simple model retraining routines can use lower-powered resources instead of requesting access to high-priority queues for constrained production-grade GPU clusters, freeing up deployment resources for other applications.\n\n\n7.6.3 Auto Tuners\nGiven its importance, there is a wide array of commercial offerings to help with hyperparameter tuning. We will briefly touch on two examples: one focused on optimization for machine learning models targeting microcontrollers and another on cloud-scale ML.\n\nBigML\nSeveral commercial auto-tuning platforms are available to address this problem. One solution is Google’s Vertex AI Cloud, which has extensive integrated support for state-of-the-art tuning techniques.\nOne of the most salient capabilities of Google’s Vertex AI-managed machine learning platform is efficient, integrated hyperparameter tuning for model development. Successfully training performant ML models requires identifying optimal configurations for a set of external hyperparameters that dictate model behavior, posing a challenging high-dimensional search problem. Vertex AI aims to simplify this through Automated Machine Learning (AutoML) tooling.\nSpecifically, data scientists can leverage Vertex AI’s hyperparameter tuning engines by providing a labeled dataset and choosing a model type such as a Neural Network or Random Forest classifier. Vertex launches a Hyperparameter Search job transparently on the backend, fully handling resource provisioning, model training, metric tracking, and result analysis automatically using advanced optimization algorithms.\nUnder the hood, Vertex AutoML employs various search strategies to intelligently explore the most promising hyperparameter configurations based on previous evaluation results. Compared to standard Grid Search or Random Search methods, Bayesian Optimization offers superior sample efficiency, requiring fewer training iterations to arrive at optimized model quality. For more complex neural architecture search spaces, Vertex AutoML utilizes Population-Based Training approaches, which evolve candidate solutions over time analogous to natural selection principles.\nVertex AI aims to democratize state-of-the-art hyperparameter search techniques at the cloud scale for all ML developers, abstracting away the underlying orchestration and execution complexity. Users focus solely on their dataset, model requirements, and accuracy goals, while Vertex manages the tuning cycle, resource allocation, model training, accuracy tracking, and artifact storage under the hood. The result is getting deployment-ready, optimized ML models faster for the target problem.\n\n\nTinyML\nEdge Impulse’s Efficient On-device Neural Network Tuner (EON Tuner) is an automated hyperparameter optimization tool designed to develop microcontroller machine learning models. It streamlines the model development process by automatically finding the best neural network configuration for efficient and accurate deployment on resource-constrained devices.\nThe key functionality of the EON Tuner is as follows. First, developers define the model hyperparameters, such as number of layers, nodes per layer, activation functions, and learning rate annealing schedule. These parameters constitute the search space that will be optimized. Next, the target microcontroller platform is selected, providing embedded hardware constraints. The user can also specify optimization objectives, such as minimizing memory footprint, lowering latency, reducing power consumption, or maximizing accuracy.\nWith the defined search space and optimization goals, the EON Tuner leverages Bayesian hyperparameter optimization to explore possible configurations intelligently. Each prospective configuration is automatically implemented as a full model specification, trained, and evaluated for quality metrics. The continual process balances exploration and exploitation to arrive at optimized settings tailored to the developer’s chosen chip architecture and performance requirements.\nThe EON Tuner frees machine learning engineers from the demandingly iterative process of hand-tuning models by automatically tuning models for embedded deployment. The tool integrates seamlessly into the Edge Impulse workflow, taking models from concept to efficiently optimized implementations on microcontrollers. The expertise encapsulated in EON Tuner regarding ML model optimization for microcontrollers ensures beginner and experienced developers alike can rapidly iterate to models fitting their project needs.\n\n\n\n\n\n\nExercise 7.2: Hyperparameter Tuning\n\n\n\n\n\nGet ready to unlock the secrets of hyperparameter tuning and take your PyTorch models to the next level! Hyperparameters are like the hidden dials and knobs that control your model’s learning superpowers. In this Colab notebook, you’ll team up with Ray Tune to find those perfect hyperparameter combinations. Learn how to define what values to search through, set up your training code for optimization, and let Ray Tune do the heavy lifting. By the end, you’ll be a hyperparameter tuning pro!\n\n\n\n\nThe video below explains the systematic organization of the hyperparameter tuning process.", + "text": "7.6 Hyperparameter Tuning\nHyperparameters are important settings in machine learning models that greatly impact how well your models ultimately perform. Unlike other model parameters that are learned during training, hyperparameters are specified by the data scientists or machine learning engineers before training the model.\nChoosing the right hyperparameter values enables your models to learn patterns from data effectively. Some examples of key hyperparameters across ML algorithms include:\n\nNeural networks: Learning rate, batch size, number of hidden units, activation functions\nSupport vector machines: Regularization strength, kernel type and parameters\nRandom forests: Number of trees, tree depth\nK-means: Number of clusters\n\nThe problem is that there are no reliable rules of thumb for choosing optimal hyperparameter configurations—you typically have to try out different values and evaluate performance. This process is called hyperparameter tuning.\nIn the early years of modern deep learning, researchers were still grappling with unstable and slow convergence issues. Common pain points included training losses fluctuating wildly, gradients exploding or vanishing, and extensive trial-and-error needed to train networks reliably. As a result, an early focal point was using hyperparameters to control model optimization. For instance, seminal techniques like batch normalization allowed faster model convergence by tuning aspects of internal covariate shift. Adaptive learning rate methods also mitigated the need for extensive manual schedules. These addressed optimization issues during training, such as uncontrolled gradient divergence. Carefully adapted learning rates are also the primary control factor for achieving rapid and stable convergence even today.\nAs computational capacity expanded exponentially in subsequent years, much larger models could be trained without falling prey to pure numerical optimization issues. The focus shifted towards generalization - though efficient convergence was a core prerequisite. State-of-the-art techniques like Transformers brought in parameters in billions. At such sizes, hyperparameters around capacity, regularization, ensembling, etc., took center stage for tuning rather than only raw convergence metrics.\nThe lesson is that understanding the acceleration and stability of the optimization process itself constitutes the groundwork. Initialization schemes, batch sizes, weight decays, and other training hyperparameters remain indispensable today. Mastering fast and flawless convergence allows practitioners to expand their focus on emerging needs around tuning for metrics like accuracy, robustness, and efficiency at scale.\n\n7.6.1 Search Algorithms\nWhen it comes to the critical process of hyperparameter tuning, there are several sophisticated algorithms that machine learning practitioners rely on to search through the vast space of possible model configurations systematically. Some of the most prominent hyperparameter search algorithms include:\n\nGrid Search: The most basic search method, where you manually define a grid of values to check for each hyperparameter. For example, checking learning rates = [0.01, 0.1, 1] and batch sizes = [32, 64, 128]. The key advantage is simplicity, but exploring all combinations leads to exponential search space explosion. Best for fine-tuning a few parameters.\nRandom Search: Instead of a grid, you define a random distribution per hyperparameter to sample values from during the search. This method is more efficient at searching a vast hyperparameter space. However, it is still somewhat arbitrary compared to more adaptive methods.\nBayesian Optimization: This is an advanced probabilistic approach for adaptive exploration based on a surrogate function to model performance over iterations. It is simple and efficient—it finds highly optimized hyperparameters in fewer evaluation steps. However, it requires more investment in setup (Snoek, Larochelle, and Adams 2012).\nEvolutionary Algorithms: These algorithms Mimic natural selection principles. They generate populations of hyperparameter combinations and evolve them over time-based on performance. These algorithms offer robust search capabilities better suited for complex response surfaces. However, many iterations are required for reasonable convergence.\nNeural Architecture Search: An approach to designing well-performing architectures for neural networks. Traditionally, NAS approaches use some form of reinforcement learning to propose neural network architectures, which are then repeatedly evaluated (Zoph and Le 2023).\n\n\nSnoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, 2960–68. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.\n\nZoph, Barret, and Quoc V. Le. 2023. “Cybernetical Intelligence.” Wiley. https://doi.org/10.1002/9781394217519.ch17.\n\n\n7.6.2 System Implications\nHyperparameter tuning can significantly impact time to convergence during model training, directly affecting overall runtime. The right values for key training hyperparameters are crucial for efficient model convergence. For example, the hyperparameter’s learning rate controls the step size during gradient descent optimization. Setting a properly tuned learning rate schedule ensures the optimization algorithm converges quickly towards a good minimum. Too small a learning rate leads to painfully slow convergence, while too large a value causes the losses to fluctuate wildly. Proper tuning ensures rapid movement towards optimal weights and biases.\nSimilarly, the batch size for stochastic gradient descent impacts convergence stability. The right batch size smooths out fluctuations in parameter updates to approach the minimum faster. More batch sizes are needed to avoid noisy convergence, while large batch sizes fail to generalize and slow down convergence due to less frequent parameter updates. Tuning hyperparameters for faster convergence and reduced training duration has direct implications on cost and resource requirements for scaling machine learning systems:\n\nLower computational costs: Shorter time to convergence means lower computational costs for training models. ML training often leverages large cloud computing instances like GPU and TPU clusters that incur heavy hourly charges. Minimizing training time directly reduces this resource rental cost, which tends to dominate ML budgets for organizations. Quicker iteration also lets data scientists experiment more freely within the same budget.\nReduced training time: Reduced training time unlocks opportunities to train more models using the same computational budget. Optimized hyperparameters stretch available resources further, allowing businesses to develop and experiment with more models under resource constraints to maximize performance.\nResource efficiency: Quicker training allows allocating smaller compute instances in the cloud since models require access to the resources for a shorter duration. For example, a one-hour training job allows using less powerful GPU instances compared to multi-hour training, which requires sustained compute access over longer intervals. This achieves cost savings, especially for large workloads.\n\nThere are other benefits as well. For instance, faster convergence reduces pressure on ML engineering teams regarding provisioning training resources. Simple model retraining routines can use lower-powered resources instead of requesting access to high-priority queues for constrained production-grade GPU clusters, freeing up deployment resources for other applications.\n\n\n7.6.3 Auto Tuners\nGiven its importance, there is a wide array of commercial offerings to help with hyperparameter tuning. We will briefly touch on two examples: one focused on optimization for machine learning models targeting microcontrollers and another on cloud-scale ML.\n\nBigML\nSeveral commercial auto-tuning platforms are available to address this problem. One solution is Google’s Vertex AI Cloud, which has extensive integrated support for state-of-the-art tuning techniques.\nOne of the most salient capabilities of Google’s Vertex AI-managed machine learning platform is efficient, integrated hyperparameter tuning for model development. Successfully training performant ML models requires identifying optimal configurations for a set of external hyperparameters that dictate model behavior, posing a challenging high-dimensional search problem. Vertex AI aims to simplify this through Automated Machine Learning (AutoML) tooling.\nSpecifically, data scientists can leverage Vertex AI’s hyperparameter tuning engines by providing a labeled dataset and choosing a model type such as a Neural Network or Random Forest classifier. Vertex launches a Hyperparameter Search job transparently on the backend, fully handling resource provisioning, model training, metric tracking, and result analysis automatically using advanced optimization algorithms.\nUnder the hood, Vertex AutoML employs various search strategies to intelligently explore the most promising hyperparameter configurations based on previous evaluation results. Compared to standard Grid Search or Random Search methods, Bayesian Optimization offers superior sample efficiency, requiring fewer training iterations to arrive at optimized model quality. For more complex neural architecture search spaces, Vertex AutoML utilizes Population-Based Training approaches, which evolve candidate solutions over time analogous to natural selection principles.\nVertex AI aims to democratize state-of-the-art hyperparameter search techniques at the cloud scale for all ML developers, abstracting away the underlying orchestration and execution complexity. Users focus solely on their dataset, model requirements, and accuracy goals, while Vertex manages the tuning cycle, resource allocation, model training, accuracy tracking, and artifact storage under the hood. The result is getting deployment-ready, optimized ML models faster for the target problem.\n\n\nTinyML\nEdge Impulse’s Efficient On-device Neural Network Tuner (EON Tuner) is an automated hyperparameter optimization tool designed to develop microcontroller machine learning models. It streamlines the model development process by automatically finding the best neural network configuration for efficient and accurate deployment on resource-constrained devices.\nThe key functionality of the EON Tuner is as follows. First, developers define the model hyperparameters, such as number of layers, nodes per layer, activation functions, and learning rate annealing schedule. These parameters constitute the search space that will be optimized. Next, the target microcontroller platform is selected, providing embedded hardware constraints. The user can also specify optimization objectives, such as minimizing memory footprint, lowering latency, reducing power consumption, or maximizing accuracy.\nWith the defined search space and optimization goals, the EON Tuner leverages Bayesian hyperparameter optimization to explore possible configurations intelligently. Each prospective configuration is automatically implemented as a full model specification, trained, and evaluated for quality metrics. The continual process balances exploration and exploitation to arrive at optimized settings tailored to the developer’s chosen chip architecture and performance requirements.\nThe EON Tuner frees machine learning engineers from the demandingly iterative process of hand-tuning models by automatically tuning models for embedded deployment. The tool integrates seamlessly into the Edge Impulse workflow, taking models from concept to efficiently optimized implementations on microcontrollers. The expertise encapsulated in EON Tuner regarding ML model optimization for microcontrollers ensures beginner and experienced developers alike can rapidly iterate to models fitting their project needs.\n\n\n\n\n\n\nExercise 7.2: Hyperparameter Tuning\n\n\n\n\n\nGet ready to unlock the secrets of hyperparameter tuning and take your PyTorch models to the next level! Hyperparameters are like the hidden dials and knobs that control your model’s learning superpowers. In this Colab notebook, you’ll team up with Ray Tune to find those perfect hyperparameter combinations. Learn how to define what values to search through, set up your training code for optimization, and let Ray Tune do the heavy lifting. By the end, you’ll be a hyperparameter tuning pro!\n\n\n\n\nVideo 7.3 explains the systematic organization of the hyperparameter tuning process.\n\n\n\n\n\n\nVideo 7.3: Hyperparameter", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -746,9 +746,9 @@ "href": "contents/training/training.html#regularization", "title": "7  AI Training", "section": "7.7 Regularization", - "text": "7.7 Regularization\nRegularization is a critical technique for improving the performance and generalizability of machine learning models in applied settings. It refers to mathematically constraining or penalizing model complexity to avoid overfitting the training data. Without regularization, complex ML models are prone to overfitting the dataset and memorizing peculiarities and noise in the training set rather than learning meaningful patterns. They may achieve high training accuracy but perform poorly when evaluating new unseen inputs.\nRegularization helps address this problem by placing constraints that favor simpler, more generalizable models that don’t latch onto sampling errors. Techniques like L1/L2 regularization directly penalize large parameter values during training, forcing the model to use the smallest parameters that can adequately explain the signal. Early stopping rules halt training when validation set performance stops improving - before the model starts overfitting.\nAppropriate regularization is crucial when deploying models to new user populations and environments where distribution shifts are likely. For example, an irregularized fraud detection model trained at a bank may work initially but accrue technical debt over time as new fraud patterns emerge.\nRegularizing complex neural networks also offers computational advantages—smaller models require less data augmentation, compute power, and data storage. Regularization also allows for more efficient AI systems, where accuracy, robustness, and resource management are thoughtfully balanced against training set limitations.\nSeveral powerful regularization techniques are commonly used to improve model generalization. Architecting the optimal strategy requires understanding how each method affects model learning and complexity.\n\n7.7.1 L1 and L2\nTwo of the most widely used regularization forms are L1 and L2 regularization. Both penalize model complexity by adding an extra term to the cost function optimized during training. This term grows larger as model parameters increase.\nL2 regularization, also known as ridge regression, adds the sum of squared magnitudes of all parameters multiplied by a coefficient α. This quadratic penalty curtails extreme parameter values more aggressively than L1 techniques. Implementation requires only changing the cost function and tuning α.\n\\[R_{L2}(\\Theta) = \\alpha \\sum_{i=1}^{n}\\theta_{i}^2\\]\nWhere:\n\n\\(R_{L2}(\\Theta)\\) - The L2 regularization term that is added to the cost function\n\n\\(\\alpha\\) - The L2 regularization hyperparameter that controls the strength of regularization\n\\(\\theta_{i}\\) - The ith model parameter\n\\(n\\) - The number of parameters in the model\n\\(\\theta_{i}^2\\) - The square of each parameter\n\nAnd the full L2 regularized cost function is:\n\\[J(\\theta) = L(\\theta) + R_{L2}(\\Theta)\\]\nWhere:\n\n\\(L(\\theta)\\) - The original unregularized cost function\n\\(J(\\theta)\\) - The new regularized cost function\n\nBoth L1 and L2 regularization penalize large weights in the neural network. However, the key difference between L1 and L2 regularization is that L2 regularization penalizes the squares of the parameters rather than the absolute values. This key difference has a considerable impact on the resulting regularized weights. L1 regularization, or lasso regression, utilizes the absolute sum of magnitudes rather than the square multiplied by α. Penalizing the absolute value of weights induces sparsity since the gradient of the errors extrapolates linearly as the weight terms tend towards zero; this is unlike penalizing the squared value of the weights, where the penalty reduces as the weights tend towards 0. By inducing sparsity in the parameter vector, L1 regularization automatically performs feature selection, setting the weights of irrelevant features to zero. Unlike L2 regularization, L1 regularization leads to sparsity as weights are set to 0; in L2 regularization, weights are set to a value very close to 0 but generally never reach exact 0. L1 regularization encourages sparsity and has been used in some works to train sparse networks that may be more hardware efficient (Hoefler et al. 2021).\n\nHoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks.” https://arxiv.org/abs/2102.00554.\n\\[R_{L1}(\\Theta) = \\alpha \\sum_{i=1}^{n}||\\theta_{i}||\\]\nWhere:\n\n\\(R_{L1}(\\Theta)\\) - The L1 regularization term that is added to the cost function\n\\(\\alpha\\) - The L1 regularization hyperparameter that controls the strength of regularization\n\\(\\theta_{i}\\) - The i-th model parameter\n\\(n\\) - The number of parameters in the model\n\\(||\\theta_{i}||\\) - The L1 norm, which takes the absolute value of each parameter\n\nAnd the full L1 regularized cost function is:\n\\[J(\\theta) = L(\\theta) + R_{L1}(\\Theta)\\]\nWhere:\n\n\\(L(\\theta)\\) - The original unregularized cost function\n\\(J(\\theta)\\) - The new regularized cost function\n\nThe choice between L1 and L2 depends on the expected model complexity and whether intrinsic feature selection is needed. Both require iterative tuning across a validation set to select the optimal α hyperparameter.\nThe two videos below explain how regularization works and can help reduce model overfitting to improve performance.\n\n\n\n\n7.7.2 Dropout\nAnother widely adopted regularization method is dropout (Srivastava et al. 2014). During training, dropout randomly sets a fraction \\(p\\) of node outputs or hidden activations to zero. This encourages greater information distribution across more nodes rather than reliance on a small number of nodes. Come prediction time; the full neural network is used, with intermediate activations scaled by \\(p\\) to maintain output magnitudes. GPU optimizations make implementing dropout efficiently straightforward via frameworks like PyTorch and TensorFlow.\n\nSrivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” J. Mach. Learn. Res. http://jmlr.org/papers/v15/srivastava14a.html.\nLet’s be more pedantic. During training with dropout, each node’s output \\(a_i\\) is passed through a dropout mask \\(r_i\\) before being used by the next layer:\n\\[ ã_i = r_i \\odot a_i \\]\nWhere:\n\n\\(a_i\\) - output of node \\(i\\)\n\\(ã_i\\) - output of node \\(i\\) after dropout\n\\(r_i\\) - independent Bernoulli random variable with probability \\(p\\) of being 1\n\\(\\odot\\) - elementwise multiplication\n\nThis dropout mask \\(r_i\\) randomly sets a fraction \\(1-p\\) of activations to 0 during training, forcing the network to make redundant representations.\nAt test time, the dropout mask is removed, and the activations are rescaled by \\(p\\) to maintain expected output magnitudes:\n\\[ a_i^{test} = p a_i\\]\nWhere:\n\n\\(a_i^{test}\\) - node output at test time\n\\(p\\) - dropout probability hyperparameter\n\nThe key hyperparameter is \\(p\\), the fraction of nodes dropped, often set between 0.2 and 0.5. Larger networks tend to benefit from more dropout, while small networks risk underfitting if too many nodes are cut out. Trial and error combined with monitoring validation performance helps tune the dropout level.\nThe following video discusses the intuition behind the dropout regularization technique and how it works.\n\n\n\n7.7.3 Early Stopping\nThe intuition behind early stopping involves tracking model performance on a held-out validation set across training epochs. At first, increases in training set fitness accompany gains in validation accuracy as the model picks up generalizable patterns. After some point, however, the model starts overfitting - latching onto peculiarities and noise in the training data that don’t apply more broadly. The validation performance peaks and then degrades if training continues. Early stopping rules halt training at this peak to prevent overfitting. This technique demonstrates how ML pipelines must monitor system feedback, not just unquestioningly maximize performance on a static training set. The system’s state evolves, and the optimal endpoints change.\nTherefore, formal early stopping methods require monitoring a metric like validation accuracy or loss after each epoch. Common curves exhibit rapid initial gains that taper off, eventually plateauing and decreasing slightly as overfitting occurs. The optimal stopping point is often between 5 and 15 epochs past the peak, depending on patient thresholds. Tracking multiple metrics can improve signal since variance exists between measures.\nSimple, early-stopping rules stop immediately at the first post-peak degradation. More robust methods introduce a patience parameter—the number of degrading epochs permitted before stopping. This avoids prematurely halting training due to transient fluctuations. Typical patience windows range from 50 to 200 validation batches. Wider windows incur the risk of overfitting. Formal tuning strategies can determine optimal patience.\n\n\n\n\n\n\nExercise 7.3: Regularization\n\n\n\n\n\nBattling Overfitting: Unlock the Secrets of Regularization! Overfitting is like your model memorizing the answers to a practice test, then failing the real exam. Regularization techniques are the study guides that help your model generalize and ace new challenges. In this Colab notebook, you’ll learn how to tune regularization parameters for optimal results using L1 & L2 regularization, dropout, and early stopping.\n\n\n\n\nThe following video covers a few other regularization methods that can reduce model overfitting.", + "text": "7.7 Regularization\nRegularization is a critical technique for improving the performance and generalizability of machine learning models in applied settings. It refers to mathematically constraining or penalizing model complexity to avoid overfitting the training data. Without regularization, complex ML models are prone to overfitting the dataset and memorizing peculiarities and noise in the training set rather than learning meaningful patterns. They may achieve high training accuracy but perform poorly when evaluating new unseen inputs.\nRegularization helps address this problem by placing constraints that favor simpler, more generalizable models that don’t latch onto sampling errors. Techniques like L1/L2 regularization directly penalize large parameter values during training, forcing the model to use the smallest parameters that can adequately explain the signal. Early stopping rules halt training when validation set performance stops improving - before the model starts overfitting.\nAppropriate regularization is crucial when deploying models to new user populations and environments where distribution shifts are likely. For example, an irregularized fraud detection model trained at a bank may work initially but accrue technical debt over time as new fraud patterns emerge.\nRegularizing complex neural networks also offers computational advantages—smaller models require less data augmentation, compute power, and data storage. Regularization also allows for more efficient AI systems, where accuracy, robustness, and resource management are thoughtfully balanced against training set limitations.\nSeveral powerful regularization techniques are commonly used to improve model generalization. Architecting the optimal strategy requires understanding how each method affects model learning and complexity.\n\n7.7.1 L1 and L2\nTwo of the most widely used regularization forms are L1 and L2 regularization. Both penalize model complexity by adding an extra term to the cost function optimized during training. This term grows larger as model parameters increase.\nL2 regularization, also known as ridge regression, adds the sum of squared magnitudes of all parameters multiplied by a coefficient α. This quadratic penalty curtails extreme parameter values more aggressively than L1 techniques. Implementation requires only changing the cost function and tuning α.\n\\[R_{L2}(\\Theta) = \\alpha \\sum_{i=1}^{n}\\theta_{i}^2\\]\nWhere:\n\n\\(R_{L2}(\\Theta)\\) - The L2 regularization term that is added to the cost function\n\n\\(\\alpha\\) - The L2 regularization hyperparameter that controls the strength of regularization\n\\(\\theta_{i}\\) - The ith model parameter\n\\(n\\) - The number of parameters in the model\n\\(\\theta_{i}^2\\) - The square of each parameter\n\nAnd the full L2 regularized cost function is:\n\\[J(\\theta) = L(\\theta) + R_{L2}(\\Theta)\\]\nWhere:\n\n\\(L(\\theta)\\) - The original unregularized cost function\n\\(J(\\theta)\\) - The new regularized cost function\n\nBoth L1 and L2 regularization penalize large weights in the neural network. However, the key difference between L1 and L2 regularization is that L2 regularization penalizes the squares of the parameters rather than the absolute values. This key difference has a considerable impact on the resulting regularized weights. L1 regularization, or lasso regression, utilizes the absolute sum of magnitudes rather than the square multiplied by α. Penalizing the absolute value of weights induces sparsity since the gradient of the errors extrapolates linearly as the weight terms tend towards zero; this is unlike penalizing the squared value of the weights, where the penalty reduces as the weights tend towards 0. By inducing sparsity in the parameter vector, L1 regularization automatically performs feature selection, setting the weights of irrelevant features to zero. Unlike L2 regularization, L1 regularization leads to sparsity as weights are set to 0; in L2 regularization, weights are set to a value very close to 0 but generally never reach exact 0. L1 regularization encourages sparsity and has been used in some works to train sparse networks that may be more hardware efficient (Hoefler et al. 2021).\n\nHoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks.” https://arxiv.org/abs/2102.00554.\n\\[R_{L1}(\\Theta) = \\alpha \\sum_{i=1}^{n}||\\theta_{i}||\\]\nWhere:\n\n\\(R_{L1}(\\Theta)\\) - The L1 regularization term that is added to the cost function\n\\(\\alpha\\) - The L1 regularization hyperparameter that controls the strength of regularization\n\\(\\theta_{i}\\) - The i-th model parameter\n\\(n\\) - The number of parameters in the model\n\\(||\\theta_{i}||\\) - The L1 norm, which takes the absolute value of each parameter\n\nAnd the full L1 regularized cost function is:\n\\[J(\\theta) = L(\\theta) + R_{L1}(\\Theta)\\]\nWhere:\n\n\\(L(\\theta)\\) - The original unregularized cost function\n\\(J(\\theta)\\) - The new regularized cost function\n\nThe choice between L1 and L2 depends on the expected model complexity and whether intrinsic feature selection is needed. Both require iterative tuning across a validation set to select the optimal α hyperparameter.\nVideo 7.4 and Video 7.5 explains how regularization works.\n\n\n\n\n\n\nVideo 7.4: Regularization\n\n\n\n\n\n\nVideo 7.5 explains how regularization can help reduce model overfitting to improve performance.\n\n\n\n\n\n\nVideo 7.5: Why Regularization Reduces Overfitting\n\n\n\n\n\n\n\n\n7.7.2 Dropout\nAnother widely adopted regularization method is dropout (Srivastava et al. 2014). During training, dropout randomly sets a fraction \\(p\\) of node outputs or hidden activations to zero. This encourages greater information distribution across more nodes rather than reliance on a small number of nodes. Come prediction time; the full neural network is used, with intermediate activations scaled by \\(p\\) to maintain output magnitudes. GPU optimizations make implementing dropout efficiently straightforward via frameworks like PyTorch and TensorFlow.\n\nSrivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” J. Mach. Learn. Res. http://jmlr.org/papers/v15/srivastava14a.html.\nLet’s be more pedantic. During training with dropout, each node’s output \\(a_i\\) is passed through a dropout mask \\(r_i\\) before being used by the next layer:\n\\[ ã_i = r_i \\odot a_i \\]\nWhere:\n\n\\(a_i\\) - output of node \\(i\\)\n\\(ã_i\\) - output of node \\(i\\) after dropout\n\\(r_i\\) - independent Bernoulli random variable with probability \\(p\\) of being 1\n\\(\\odot\\) - elementwise multiplication\n\nThis dropout mask \\(r_i\\) randomly sets a fraction \\(1-p\\) of activations to 0 during training, forcing the network to make redundant representations.\nAt test time, the dropout mask is removed, and the activations are rescaled by \\(p\\) to maintain expected output magnitudes:\n\\[ a_i^{test} = p a_i\\]\nWhere:\n\n\\(a_i^{test}\\) - node output at test time\n\\(p\\) - dropout probability hyperparameter\n\nThe key hyperparameter is \\(p\\), the fraction of nodes dropped, often set between 0.2 and 0.5. Larger networks tend to benefit from more dropout, while small networks risk underfitting if too many nodes are cut out. Trial and error combined with monitoring validation performance helps tune the dropout level.\nVideo 7.6 discusses the intuition behind the dropout regularization technique and how it works.\n\n\n\n\n\n\nVideo 7.6: Dropout\n\n\n\n\n\n\n\n\n7.7.3 Early Stopping\nThe intuition behind early stopping involves tracking model performance on a held-out validation set across training epochs. At first, increases in training set fitness accompany gains in validation accuracy as the model picks up generalizable patterns. After some point, however, the model starts overfitting - latching onto peculiarities and noise in the training data that don’t apply more broadly. The validation performance peaks and then degrades if training continues. Early stopping rules halt training at this peak to prevent overfitting. This technique demonstrates how ML pipelines must monitor system feedback, not just unquestioningly maximize performance on a static training set. The system’s state evolves, and the optimal endpoints change.\nTherefore, formal early stopping methods require monitoring a metric like validation accuracy or loss after each epoch. Common curves exhibit rapid initial gains that taper off, eventually plateauing and decreasing slightly as overfitting occurs. The optimal stopping point is often between 5 and 15 epochs past the peak, depending on patient thresholds. Tracking multiple metrics can improve signal since variance exists between measures.\nSimple, early-stopping rules stop immediately at the first post-peak degradation. More robust methods introduce a patience parameter—the number of degrading epochs permitted before stopping. This avoids prematurely halting training due to transient fluctuations. Typical patience windows range from 50 to 200 validation batches. Wider windows incur the risk of overfitting. Formal tuning strategies can determine optimal patience.\n\n\n\n\n\n\nExercise 7.3: Regularization\n\n\n\n\n\nBattling Overfitting: Unlock the Secrets of Regularization! Overfitting is like your model memorizing the answers to a practice test, then failing the real exam. Regularization techniques are the study guides that help your model generalize and ace new challenges. In this Colab notebook, you’ll learn how to tune regularization parameters for optimal results using L1 & L2 regularization, dropout, and early stopping.\n\n\n\n\nVideo 7.7 covers a few other regularization methods that can reduce model overfitting.\n\n\n\n\n\n\nVideo 7.7: Other Regularization Methods", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -757,9 +757,9 @@ "href": "contents/training/training.html#weight-initialization", "title": "7  AI Training", "section": "7.8 Weight Initialization", - "text": "7.8 Weight Initialization\nProper Initializing the weights in a neural network before training is a vital step directly impacting model performance. Randomly initializing weights to very large or small values can lead to problems like vanishing/exploding gradients, slow convergence of training, or getting trapped in poor local minima. Proper weight initialization accelerates model convergence during training and carries implications for system performance at inference time in production environments. Some key aspects include:\n\nFaster Time-to-Accuracy: Carefully tuned Initialization leads to faster convergence, which results in models reaching target accuracy milestones earlier in the training cycle. For instance, Xavier init could reduce time-to-accuracy by 20% versus bad random init. As training is typically the most time- and compute-intensive phase, this directly enhances ML system velocity and productivity.\nModel Iteration Cycle Efficiency: If models train faster, the overall turnaround time for experimentation, evaluation, and model design iterations decreases significantly. Systems have more flexibility to explore architectures, data pipelines, etc, within given timeframes.\nImpact on Necessary Training Epochs: The training process runs for multiple epochs - with each full pass through the data being an epoch. Good Initialization can reduce the epochs required to converge the loss and accuracy curves on the training set by 10-30%. This means tangible resource and infrastructure cost savings.\nEffect on Training Hyperparameters: Weight initialization parameters interact strongly with certain regularization hyperparameters that govern the training dynamics, like learning rate schedules and dropout probabilities. Finding the right combination of settings is non-trivial. Appropriate Initialization smoothens this search.\n\nWeight initialization has cascading benefits for machine learning engineering efficiency and minimized system resource overhead. It is an easily overlooked tactic that every practitioner should master. The choice of which weight initialization technique to use depends on factors like model architecture (number of layers, connectivity pattern, etc.), activation functions, and the specific problem being solved. Over the years, researchers have developed and empirically verified different initialization strategies targeted to common neural network architectures, which we will discuss here.\n\n7.8.1 Uniform and Normal Initialization\nWhen randomly initializing weights, two standard probability distributions are commonly used - uniform and Gaussian (normal). The uniform distribution sets an equal probability of the initial weight parameters falling anywhere within set minimum and maximum bounds. For example, the bounds could be -1 and 1, leading to a uniform spread of weights between these limits. The Gaussian distribution, on the other hand, concentrates probability around a mean value, following the shape of a bell curve. Most weight values will cluster in the region of the specified mean, with fewer samples towards the extreme ends. The standard deviation (std dev) parameter controls the spread around the mean.\nThe choice between uniform or normal Initialization depends on the network architecture and activation functions. For shallow networks, a normal distribution with a relatively small std dev (e.g., 0.01) is recommended. The bell curve prevents large weight values that could trigger training instability in small networks. For deeper networks, a normal distribution with higher std dev (say 0.5 or above) or uniform distribution may be preferred to account for vanishing gradient issues over many layers. The larger spread drives greater differentiation between neuron behaviors. Fine-tuning the initialization distribution parameters is crucial for stable and speedy model convergence. Monitoring training loss trends can diagnose issues for tweaking the parameters iteratively.\n\n\n7.8.2 Xavier/Glorot Initialization\nProposed by Glorot and Bengio (2010), this initialization technique is specially designed for sigmoid and tanh activation functions. These saturated activations can cause vanishing or exploding gradients during backpropagation over many layers.\n\nGlorot, Xavier, and Yoshua Bengio. 2010. “Understanding the Difficulty of Training Deep Feedforward Neural Networks.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. https://proceedings.mlr.press/v9/glorot10a.html.\nThe Xavier method cleverly sets the variance of the weight distribution based on the number of inputs and outputs to each layer. The intuition is that this balances the flow of information and gradients throughout the network. For example, consider a layer with 300 input units and 100 output units. Plugging this into the formula variance = 2/(#inputs + #outputs) gives a variance of 2/(300+100) = 0.01.\nSampling the initial weights from a uniform or normal distribution centered at 0 with this variance provides much smoother training convergence for deep sigmoid/tanh networks. The gradients are well-conditioned, preventing exponential vanishing or growth.\n\n\n7.8.3 He Initialization\nAs proposed by He et al. (2015), this Initialization is tailored to ReLU (Rectified Linear Unit) activation functions. ReLUs introduce the dying neuron problem where units get stuck outputting all 0s if they receive strong negative inputs initially. This slows and hinders training.\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” In 2015 IEEE International Conference on Computer Vision (ICCV), 1026–34. IEEE. https://doi.org/10.1109/iccv.2015.123.\nHe overcomes this by sampling weights from a distribution with a variance set based only on the number of inputs per layer, disregarding the outputs. This keeps the incoming signals small enough to activate the ReLUs into their linear regime from the beginning, avoiding dead units. For a layer with 1024 inputs, the formula variance = 2/1024 = 0.002 keeps most weights concentrated closely around 0.\nThis specialized Initialization allows ReLU networks to converge efficiently right from the start. The choice between Xavier and He must match the intended network activation function.\n\n\n\n\n\n\nExercise 7.4: Weight Initialization\n\n\n\n\n\nGet your neural network off to a strong start with weight initialization! How you set those initial weights can make or break your model’s training. Think of it like tuning the instruments in an orchestra before the concert. In this Colab notebook, you’ll learn that the right initialization strategy can save time, improve model performance, and make your deep-learning journey much smoother.\n\n\n\n\nThe video below emphasizes the importance of deliberately selecting initial weight values over random choices.", + "text": "7.8 Weight Initialization\nProper Initializing the weights in a neural network before training is a vital step directly impacting model performance. Randomly initializing weights to very large or small values can lead to problems like vanishing/exploding gradients, slow convergence of training, or getting trapped in poor local minima. Proper weight initialization accelerates model convergence during training and carries implications for system performance at inference time in production environments. Some key aspects include:\n\nFaster Time-to-Accuracy: Carefully tuned Initialization leads to faster convergence, which results in models reaching target accuracy milestones earlier in the training cycle. For instance, Xavier init could reduce time-to-accuracy by 20% versus bad random init. As training is typically the most time- and compute-intensive phase, this directly enhances ML system velocity and productivity.\nModel Iteration Cycle Efficiency: If models train faster, the overall turnaround time for experimentation, evaluation, and model design iterations decreases significantly. Systems have more flexibility to explore architectures, data pipelines, etc, within given timeframes.\nImpact on Necessary Training Epochs: The training process runs for multiple epochs - with each full pass through the data being an epoch. Good Initialization can reduce the epochs required to converge the loss and accuracy curves on the training set by 10-30%. This means tangible resource and infrastructure cost savings.\nEffect on Training Hyperparameters: Weight initialization parameters interact strongly with certain regularization hyperparameters that govern the training dynamics, like learning rate schedules and dropout probabilities. Finding the right combination of settings is non-trivial. Appropriate Initialization smoothens this search.\n\nWeight initialization has cascading benefits for machine learning engineering efficiency and minimized system resource overhead. It is an easily overlooked tactic that every practitioner should master. The choice of which weight initialization technique to use depends on factors like model architecture (number of layers, connectivity pattern, etc.), activation functions, and the specific problem being solved. Over the years, researchers have developed and empirically verified different initialization strategies targeted to common neural network architectures, which we will discuss here.\n\n7.8.1 Uniform and Normal Initialization\nWhen randomly initializing weights, two standard probability distributions are commonly used - uniform and Gaussian (normal). The uniform distribution sets an equal probability of the initial weight parameters falling anywhere within set minimum and maximum bounds. For example, the bounds could be -1 and 1, leading to a uniform spread of weights between these limits. The Gaussian distribution, on the other hand, concentrates probability around a mean value, following the shape of a bell curve. Most weight values will cluster in the region of the specified mean, with fewer samples towards the extreme ends. The standard deviation (std dev) parameter controls the spread around the mean.\nThe choice between uniform or normal Initialization depends on the network architecture and activation functions. For shallow networks, a normal distribution with a relatively small std dev (e.g., 0.01) is recommended. The bell curve prevents large weight values that could trigger training instability in small networks. For deeper networks, a normal distribution with higher std dev (say 0.5 or above) or uniform distribution may be preferred to account for vanishing gradient issues over many layers. The larger spread drives greater differentiation between neuron behaviors. Fine-tuning the initialization distribution parameters is crucial for stable and speedy model convergence. Monitoring training loss trends can diagnose issues for tweaking the parameters iteratively.\n\n\n7.8.2 Xavier/Glorot Initialization\nProposed by Glorot and Bengio (2010), this initialization technique is specially designed for sigmoid and tanh activation functions. These saturated activations can cause vanishing or exploding gradients during backpropagation over many layers.\n\nGlorot, Xavier, and Yoshua Bengio. 2010. “Understanding the Difficulty of Training Deep Feedforward Neural Networks.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. https://proceedings.mlr.press/v9/glorot10a.html.\nThe Xavier method cleverly sets the variance of the weight distribution based on the number of inputs and outputs to each layer. The intuition is that this balances the flow of information and gradients throughout the network. For example, consider a layer with 300 input units and 100 output units. Plugging this into the formula variance = 2/(#inputs + #outputs) gives a variance of 2/(300+100) = 0.01.\nSampling the initial weights from a uniform or normal distribution centered at 0 with this variance provides much smoother training convergence for deep sigmoid/tanh networks. The gradients are well-conditioned, preventing exponential vanishing or growth.\n\n\n7.8.3 He Initialization\nAs proposed by He et al. (2015), this Initialization is tailored to ReLU (Rectified Linear Unit) activation functions. ReLUs introduce the dying neuron problem where units get stuck outputting all 0s if they receive strong negative inputs initially. This slows and hinders training.\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” In 2015 IEEE International Conference on Computer Vision (ICCV), 1026–34. IEEE. https://doi.org/10.1109/iccv.2015.123.\nHe overcomes this by sampling weights from a distribution with a variance set based only on the number of inputs per layer, disregarding the outputs. This keeps the incoming signals small enough to activate the ReLUs into their linear regime from the beginning, avoiding dead units. For a layer with 1024 inputs, the formula variance = 2/1024 = 0.002 keeps most weights concentrated closely around 0.\nThis specialized Initialization allows ReLU networks to converge efficiently right from the start. The choice between Xavier and He must match the intended network activation function.\n\n\n\n\n\n\nExercise 7.4: Weight Initialization\n\n\n\n\n\nGet your neural network off to a strong start with weight initialization! How you set those initial weights can make or break your model’s training. Think of it like tuning the instruments in an orchestra before the concert. In this Colab notebook, you’ll learn that the right initialization strategy can save time, improve model performance, and make your deep-learning journey much smoother.\n\n\n\n\nVideo 7.8 emphasizes the importance of deliberately selecting initial weight values over random choices.\n\n\n\n\n\n\nVideo 7.8: Weight Initialization", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -770,7 +770,7 @@ "section": "7.9 Activation Functions", "text": "7.9 Activation Functions\nActivation functions play a crucial role in neural networks. They introduce nonlinear behaviors that allow neural nets to model complex patterns. Element-wise activation functions are applied to the weighted sums coming into each neuron in the network. Without activation functions, neural nets would be reduced to linear regression models.\nIdeally, activation functions possess certain desirable qualities:\n\nNonlinear: They enable modeling complex relationships through nonlinear transformations of the input sum.\nDifferentiable: They must have well-defined first derivatives to enable backpropagation and gradient-based optimization during training.\nRange-bounding: They constrain the output signal, preventing an explosion. For example, sigmoid squashes inputs to (0,1).\n\nAdditionally, properties like computational efficiency, monotonicity, and smoothness make some activations better suited over others based on network architecture and problem complexity.\nWe will briefly survey some of the most widely adopted activation functions and their strengths and limitations. We will also provide guidelines for selecting appropriate functions matched to ML system constraints and use case needs.\n\n7.9.1 Sigmoid\nThe sigmoid activation applies a squashing S-shaped curve tightly binding the output between 0 and 1. It has the mathematical form:\n\\[ sigmoid(x) = \\frac{1}{1+e^{-x}} \\]\nThe exponentiation transform allows the function to smoothly transition from near 0 towards near 1 as the input moves from very negative to very positive. The monotonic rise covers the full (0,1) range.\nPros:\nA smooth gradient is always available for a backdrop Output bounded preventing “exploding” Simple formula Cons:\nTendency to saturate at extremes, killing gradients (“vanishing”) Not zero-centered - outputs not symmetrically distributed\n\n\n7.9.2 Tanh\nTanh or hyperbolic tangent also assumes an S-shape but is zero-centered, meaning the average output value is 0.\n\\[ tanh(x) = \\frac{e^x - e^{-x}}{e^x + e^{-x}} \\]\nThe numerator/denominator transform shifts the range from (0,1) in Sigmoid to (-1, 1) in tanh.\nMost pros/cons are shared with Sigmoid, but Tanh avoids some output saturation issues by being centered. However, it still suffers from vanishing gradients with many layers.\n\n\n7.9.3 ReLU\nThe Rectified Linear Unit (ReLU) introduces a simple thresholding behavior with its mathematical form:\n\\[ ReLU(x) = max(0, x) \\]\nIt leaves all positive inputs unchanged while clipping all negative values to 0. This sparse activation and cheap computation make ReLU widely favored over sigmoid/tanh.\nFigure 7.6 demonstrates the 3 activation functions we discussed above -Tanh, ReLU, Sigmoid- in addition to the Linear case.\n\n\n\n\n\n\nFigure 7.6: Common activation functions. Credit: AI Wiki.\n\n\n\n\n\n7.9.4 Softmax\nThe softmax activation function is generally used as the last layer for classification tasks to normalize the activation value vector so that its elements sum to 1. This is useful for classification tasks where we want to learn to predict class-specific probabilities of a particular input, in which case the cumulative probability across classes is equal to 1. The softmax activation function is defined as\n\\[\\sigma(z_i) = \\frac{e^{z_{i}}}{\\sum_{j=1}^K e^{z_{j}}} \\ \\ \\ for\\ i=1,2,\\dots,K\\]\n\n\n7.9.5 Pros and Cons\nHere are the summarizing pros and cons of these various standard activation functions:\n\n\n\n\n\n\n\n\nActivation Function\nPros\nCons\n\n\n\n\nSigmoid\nSmooth gradient for backdrop Output bounded between 0 and 1\nSaturation kills gradients Not zero-centered\n\n\nTanh\nSmoother gradient than sigmoid Zero-centered output [-1, 1]\nStill suffers vanishing gradient issue\n\n\nReLU\nComputationally efficient Introduces sparsity Avoids vanishing gradients\n“Dying ReLU” units Not bounded\n\n\nSoftmax\nUsed for the last layer to normalize vector outputs to be a probability distribution; typically used for classification tasks\n-\n\n\n\n\n\n\n\n\n\nExercise 7.5: Activation Functions\n\n\n\n\n\nUnlock the power of activation functions! These little mathematical workhorses are what make neural networks so incredibly flexible. In this Colab notebook, you’ll go hands-on with functions like the Sigmoid, tanh, and the superstar ReLU. See how they transform inputs and learn which works best in different situations. It’s the key to building neural networks that can tackle complex problems!", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -781,7 +781,7 @@ "section": "7.10 System Bottlenecks", "text": "7.10 System Bottlenecks\nAs introduced earlier, neural networks comprise linear operations (matrix multiplications) interleaved with element-wise nonlinear activation functions. The most computationally expensive portion of neural networks is the linear transformations, specifically the matrix multiplications between each layer. These linear layers map the activations from the previous layer to a higher dimensional space that serves as inputs to the next layer’s activation function.\n\n7.10.1 Runtime Complexity of Matrix Multiplication\n\nLayer Multiplications vs. Activations\nThe bulk of computation in neural networks arises from the matrix multiplications between layers. Consider a neural network layer with an input dimension of \\(M\\) = 500 and output dimension of \\(N\\) = 1000; the matrix multiplication requires \\(O(N \\cdot M) = O(1000 \\cdot 500) = 500,000\\) multiply-accumulate (MAC) operations between those layers.\nContrast this with the preceding layer, which had \\(M\\) = 300 inputs, requiring \\(O(500 \\cdot 300) = 150,000\\) ops. We can see how the computations scale exponentially as the layer widths increase, with the total computations across \\(L\\) layers being \\(\\sum_{l=1}^{L-1} O\\big(N^{(l)} \\cdot M^{(l-1)}\\big)\\).\nNow, comparing the matrix multiplication to the activation function, which requires only \\(O(N) = 1000\\) element-wise nonlinearities for \\(N = 1000\\) outputs, we can see the linear transformations dominating the activations computationally.\nThese large matrix multiplications impact hardware choices, inference latency, and power constraints for real-world neural network applications. For example, a typical DNN layer may require 500,000 multiply-accumulates vs. only 1000 nonlinear activations, demonstrating a 500x increase in mathematical operations.\nWhen training neural networks, we typically use mini-batch gradient descent, operating on small batches of data simultaneously. Considering a batch size of \\(B\\) training examples, the input to the matrix multiplication becomes a \\(M \\times B\\) matrix, while the output is an \\(N \\times B\\) matrix.\n\n\nMini-batch\nIn training neural networks, we need to repeatedly estimate the gradient of the loss function with respect to the network parameters (i.e., weights, and biases). This gradient indicates which direction the parameters should be updated in to minimize the loss. As introduced previously, we perform updates over a batch of data points every update, also known as stochastic gradient descent or mini-batch gradient descent.\nThe most straightforward approach is to estimate the gradient based on a single training example, compute the parameter update, lather, rinse, and repeat for the next example. However, this involves very small and frequent parameter updates that can be computationally inefficient and may need to be more accurate in terms of convergence due to the stochasticity of using just a single data point for a model update.\nInstead, mini-batch gradient descent balances convergence stability and computational efficiency. Rather than computing the gradient on single examples, we estimate the gradient based on small “mini-batches” of data—usually between 8 and 256 examples in practice.\nThis provides a noisy but consistent gradient estimate that leads to more stable convergence. Additionally, the parameter update must only be performed once per mini-batch rather than once per example, reducing computational overhead.\nBy tuning the mini-batch size, we can control the tradeoff between the smoothness of the estimate (larger batches are generally better) and the frequency of updates (smaller batches allow more frequent updates). Mini-batch sizes are usually powers of 2, so they can efficiently leverage parallelism across GPU cores.\nSo, the total computation performs an \\(N \\times M\\) by \\(M \\times B\\) matrix multiplication, yielding \\(O(N \\cdot M \\cdot B)\\) floating point operations. As a numerical example, \\(N=1000\\) hidden units, \\(M=500\\) input units, and a batch size \\(B=64\\) equates to 1000 x 500 x 64 = 32 million multiply-accumulates per training iteration!\nIn contrast, the activation functions are applied element-wise to the \\(N \\times B\\) output matrix, requiring only \\(O(N \\cdot B)\\) computations. For \\(N=1000\\) and \\(B=64\\), that is just 64,000 nonlinearities - 500X less work than the matrix multiplication.\nAs we increase the batch size to fully leverage parallel hardware like GPUs, the discrepancy between matrix multiplication and activation function cost grows even larger. This reveals how optimizing the linear algebra operations offers tremendous efficiency gains.\nTherefore, matrix multiplication is central in analyzing where and how neural networks spend computation. For example, matrix multiplications often account for over 90% of inference latency and training time in common convolutional and recurrent neural networks.\n\n\nOptimizing Matrix Multiplication\nSeveral techniques enhance the efficiency of general dense/sparse matrix-matrix and matrix-vector operations to improve overall efficiency. Some key methods include:\n\nLeveraging optimized math libraries like cuBLAS for GPU acceleration\nEnabling lower precision formats like FP16 or INT8 where accuracy permits\nEmploying Tensor Processing Units with hardware matrix multiplication\nSparsity-aware computations and data storage formats to exploit zero parameters\nApproximating matrix multiplications with algorithms like Fast Fourier Transforms\nModel architecture design to reduce layer widths and activations\nQuantization, pruning, distillation, and other compression techniques\nParallelization of computation across available hardware\nCaching/pre-computing results where possible to reduce redundant operations\n\nThe potential optimization techniques are vast, given the outsized portion of time models spend in matrix and vector math. Even incremental improvements speed up runtimes and lower energy usage. Finding new ways to enhance these linear algebra primitives remains an active area of research aligned with the future demands of machine learning. We will discuss these in detail in the Optimizations and AI Acceleration chapters.\n\n\n\n7.10.2 Compute vs. Memory Bottleneck\nAt this point, matrix-matrix multiplication is the core mathematical operation underpinning neural networks. Both training and inference for neural networks heavily utilize these matrix multiply operations. Analysis shows that over 90% of computational requirements in state-of-the-art neural networks arise from matrix multiplications. Consequently, the performance of matrix multiplication has an enormous influence on overall model training or inference time.\n\nTraining versus Inference\nWhile training and inference rely heavily on matrix multiplication performance, their precise computational profiles differ. Specifically, neural network inference tends to be more compute-bound than training for an equivalent batch size. The key difference lies in the backpropagation pass, which is only required during training. Backpropagation involves a sequence of matrix multiply operations to calculate gradients with respect to activations across each network layer. Critically, though, no additional memory bandwidth is needed here—the inputs, outputs, and gradients are read/written from cache or registers.\nAs a result, training exhibits lower arithmetic intensities, with gradient calculations bounded by memory access instead of FLOPs. In contrast, the forward propagation dominates neural network inference, which corresponds to a series of matrix-matrix multiplies. With no memory-intensive gradient retrospecting, larger batch sizes readily push inference into being extremely compute-bound. The high measured arithmetic intensities exhibit this. Response times may be critical for some inference applications, forcing the application provider to use a smaller batch size to meet these response-time requirements, thereby reducing hardware efficiency; hence, inferences may see lower hardware utilization.\nThe implications are that hardware provisioning and bandwidth vs. FLOP tradeoffs differ depending on whether a system targets training or inference. High-throughput, low-latency servers for inference should emphasize computational power instead of memory, while training clusters require a more balanced architecture.\nHowever, matrix multiplication exhibits an interesting tension - the underlying hardware’s memory bandwidth or arithmetic throughput capabilities can bind it. The system’s ability to fetch and supply matrix data versus its ability to perform computational operations determines this direction.\nThis phenomenon has profound impacts; hardware must be designed judiciously, and software optimizations must be considered. Optimizing and balancing compute versus memory to alleviate this underlying matrix multiplication bottleneck is crucial for efficient model training and deployment.\nFinally, batch size may impact convergence rates during neural network training, another important consideration. For example, there are generally diminishing returns in benefits to convergence with extremely large batch sizes (i.e.,> 16384). In contrast, extremely large batch sizes may be increasingly beneficial from a hardware/arithmetic intensity perspective; using such large batches may not translate to faster convergence vs wall-clock time due to their diminishing benefits to convergence. These tradeoffs are part of the design decisions core to systems for the machine-learning type of research.\n\n\nBatch Size\nThe batch size used during neural network training and inference significantly impacts whether matrix multiplication poses more of a computational or memory bottleneck. Concretely, the batch size refers to the number of samples propagated through the network together in one forward/backward pass. Matrix multiplication equates to larger matrix sizes.\nSpecifically, let’s look at the arithmetic intensity of matrix multiplication during neural network training. This measures the ratio between computational operations and memory transfers. The matrix multiply of two matrices of size \\(N \\times M\\) and \\(M \\times B\\) requires \\(N \\times M \\times B\\) multiply-accumulate operations, but only transfers of \\(N \\times M + M \\times B\\) matrix elements.\nAs we increase the batch size \\(B\\), the number of arithmetic operations grows faster than the memory transfers. For example, with a batch size of 1, we need \\(N \\times M\\) operations and \\(N + M\\) transfers, giving an arithmetic intensity ratio of around \\(\\frac{N \\times M}{N+M}\\). But with a large batch size of 128, the intensity ratio becomes \\(\\frac{128 \\times N \\times M}{N \\times M + M \\times 128} \\approx 128\\). Using a larger batch size shifts the overall computation from memory-bounded to more compute-bounded. AI training uses large batch sizes and is generally limited by peak arithmetic computational performance, i.e., Application 3 in Figure 7.7.\nTherefore, batched matrix multiplication is far more computationally intensive than memory access bound. This has implications for hardware design and software optimizations, which we will cover next. The key insight is that we can significantly alter the computational profile and bottlenecks posed by neural network training and inference by tuning the batch size.\n\n\n\n\n\n\nFigure 7.7: AI training roofline model.\n\n\n\n\n\nHardware Characteristics\nModern hardware like CPUs and GPUs is highly optimized for computational throughput rather than memory bandwidth. For example, high-end H100 Tensor Core GPUs can deliver over 60 TFLOPS of double-precision performance but only provide up to 3 TB/s of memory bandwidth. This means there is almost a 20x imbalance between arithmetic units and memory access; consequently, for hardware like GPU accelerators, neural network training workloads must be made as computationally intensive as possible to utilize the available resources fully.\nThis further motivates the need for using large batch sizes during training. When using a small batch, the matrix multiplication is bounded by memory bandwidth, underutilizing the abundant compute resources. However, we can shift the bottleneck towards computation and attain much higher arithmetic intensity with sufficiently large batches. For instance, batches of 256 or 512 samples may be needed to saturate a high-end GPU. The downside is that larger batches provide less frequent parameter updates, which can impact convergence. Still, the parameter serves as an important tuning knob to balance memory vs compute limitations.\nTherefore, given the imbalanced compute-memory architectures of modern hardware, employing large batch sizes is essential to alleviate bottlenecks and maximize throughput. As mentioned, the subsequent software and algorithms also need to accommodate such batch sizes since larger batch sizes may have diminishing returns toward the network’s convergence. Using very small batch sizes may lead to suboptimal hardware utilization, ultimately limiting training efficiency. Scaling up to large batch sizes is a research topic explored in various works that aim to do large-scale training (You et al. 2018).\n\nYou, Yang, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. 2018. “ImageNet Training in Minutes.” https://arxiv.org/abs/1709.05011.\n\n\nModel Architectures\nThe underlying neural network architecture also affects whether matrix multiplication poses more of a computational or memory bottleneck during execution. Transformers and MLPs are much more compute-bound than CNN convolutional neural networks. This stems from the types of matrix multiplication operations involved in each model. Transformers rely on self-attention, multiplying large activation matrices by massive parameter matrices to relate elements. MLPs stack fully connected layers, also requiring large matrix multiplies.\nIn contrast, the convolutional layers in CNNs have a sliding window that reuses activations and parameters across the input, which means fewer unique matrix operations are needed. However, the convolutions require repeatedly accessing small input parts and moving partial sums to populate each window. Even though the arithmetic operations in convolutions are intense, this data movement and buffer manipulation impose huge memory access overheads. CNNs comprise several layered stages, so intermediate outputs must frequently materialize in memory.\nAs a result, CNN training tends to be more memory bandwidth bound relative to arithmetic bound compared to Transformers and MLPs. Therefore, the matrix multiplication profile, and in turn, the bottleneck posed, varies significantly based on model choice. Hardware and systems need to be designed with appropriate compute-memory bandwidth balance depending on target model deployment. Models relying more on attention and MLP layers require higher arithmetic throughput compared to CNNs, which necessitates high memory bandwidth.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -790,9 +790,9 @@ "href": "contents/training/training.html#training-parallelization", "title": "7  AI Training", "section": "7.11 Training Parallelization", - "text": "7.11 Training Parallelization\nTraining neural networks entails intensive computational and memory demands. The backpropagation algorithm for calculating gradients and updating weights consists of repeated matrix multiplications and arithmetic operations over the entire dataset. For example, one pass of backpropagation scales in time complexity with \\(O(num\\_parameters \\times batch\\_size \\times sequence\\_length)\\).\nThe computational requirements grow rapidly as model size increases in parameters and layers. Moreover, the algorithm requires storing activation outputs and model parameters for the backward pass, which grows with model size.\nLarger models cannot fit and train on a single accelerator device like a GPU, and the memory footprint becomes prohibitive. Therefore, we need to parallelize model training across multiple devices to provide sufficient compute and memory to train state-of-the-art neural networks.\nAs shown in Figure 7.8, the two main approaches are data parallelism, which replicates the model across devices while splitting the input data batch-wise, and model parallelism, which partitions the model architecture itself across different devices. By training in parallel, we can leverage greater aggregate compute and memory resources to overcome system limitations and accelerate deep learning workloads.\n\n\n\n\n\n\nFigure 7.8: Data parallelism versus model parallelism.\n\n\n\n\n7.11.1 Data Parallel\nData parallelization is a common approach to parallelize machine learning training across multiple processing units, such as GPUs or distributed computing resources. The training dataset is divided into batches in data parallelism, and a separate processing unit processes each batch. The model parameters are then updated based on the gradients computed from the processing of each batch. Here’s a step-by-step description of data parallel parallelization for ML training:\n\nDividing the Dataset: The training dataset is divided into smaller batches, each containing a subset of the training examples.\nReplicating the Model: The neural network model is replicated across all processing units, and each processing unit has its copy of the model.\nParallel Computation: Each processing unit takes a different batch and independently computes the forward and backward passes. During the forward pass, the model makes predictions on the input data. The loss function calculates gradients for the model parameters during the backward pass.\nGradient Aggregation: After processing their respective batches, the gradients from each processing unit are aggregated. Common aggregation methods include summation or averaging of the gradients.\nParameter Update: The aggregated gradients update the model parameters. The update can be performed using optimization algorithms like SGD or variants like Adam.\nSynchronization: After the update, all processing units synchronize their model parameters, ensuring that each has the latest version of the model.\n\nThe prior steps are repeated for several iterations or until convergence.\nLet’s take a specific example. We have 256 batch sizes and 8 GPUs; each GPU will get a micro-batch of 32 samples. Their forward and backward passes compute losses and gradients only based on the local 32 samples. The gradients get aggregated across devices with a parameter server or collective communications library to get the effective gradient for the global batch. Weight updates happen independently on each GPU according to these gradients. After a configured number of iterations, updated weights synchronize and equalize across devices before continuing to the next iterations.\nData parallelism is effective when the model is large, and the dataset is substantial, as it allows for parallel processing of different parts of the data. It is widely used in deep learning frameworks and libraries that support distributed training, such as TensorFlow and PyTorch. However, to ensure efficient parallelization, care must be taken to handle issues like communication overhead, load balancing, and synchronization.\n\n\n7.11.2 Model Parallel\nModel parallelism refers to distributing the neural network model across multiple devices rather than replicating the full model like data parallelism. This is particularly useful when a model is too large to fit into the memory of a single GPU or accelerator device. While this might not be specifically applicable for embedded or TinyML use cases as most models are relatively small(er), it is still useful to know.\nIn model parallel training, different parts or layers of the model are assigned to separate devices. The input activations and intermediate outputs get partitioned and passed between these devices during the forward and backward passes to coordinate gradient computations across model partitions.\nThe memory footprint and computational operations are distributed by splitting the model architecture across multiple devices instead of concentrating on one. This enables training very large models with billions of parameters that otherwise exceed the capacity of a single device. There are several main ways in which we can do partitioning:\n\nLayer-wise parallelism: Consecutive layers are distributed onto different devices. For example, device 1 contains layers 1-3; device 2 contains layers 4-6. The output activations from layer 3 would be transferred to device 2 to start the next layers for the forward pass computations.\nFilter-wise parallelism: In convolutional layers, output filters can be split among devices. Each device computes activation outputs for a subset of filters, which get concatenated before propagating further.\nSpatial parallelism: The input images get divided spatially, so each device processes over a certain region like the top-left quarter of images. The output regions then combine to form the full output.\n\nAdditionally, hybrid combinations can split the model layer-wise and data batch-wise. The appropriate type of model parallelism depends on the specific neural architecture constraints and hardware setup. Optimizing the partitioning and communication for the model topology is key to minimizing overhead.\nHowever, as the model parts run on physically separate devices, they must communicate and synchronize their parameters during each training step. The backward pass must ensure gradient updates propagate accurately across the model partitions. Hence, coordination and high-speed interconnecting between devices are crucial for optimizing the performance of model parallel training. Careful partitioning and communication protocols are required to minimize transfer overhead.\n\n\n7.11.3 Comparison\nTo summarize, Table 7.2 demonstrates some of the key characteristics for comparing data parallelism and model parallelism:\n\n\n\nTable 7.2: Comparing data parallelism and model parallelism.\n\n\n\n\n\n\n\n\n\n\nCharacteristic\nData Parallelism\nModel Parallelism\n\n\n\n\nDefinition\nDistribute data across devices with model replicas\nDistribute model across devices\n\n\nObjective\nAccelerate training through compute scaling\nEnable larger model training\n\n\nScaling Method\nScale devices/workers\nScale model size\n\n\nMain Constraint\nModel size per device\nDevice coordination overhead\n\n\nHardware Requirements\nMultiple GPU/TPUs\nOften specialized interconnect\n\n\nPrimary Challenge\nParameter synchronization\nComplex partitioning + communication\n\n\nTypes\nN/A\nLayer-wise, filter-wise, spatial\n\n\nCode Complexity\nMinimal changes\nMore significant model surgery\n\n\nPopular Libraries\nHorovod, PyTorch Distributed\nMesh TensorFlow", + "text": "7.11 Training Parallelization\nTraining neural networks entails intensive computational and memory demands. The backpropagation algorithm for calculating gradients and updating weights consists of repeated matrix multiplications and arithmetic operations over the entire dataset. For example, one pass of backpropagation scales in time complexity with \\(O(num\\_parameters \\times batch\\_size \\times sequence\\_length)\\).\nThe computational requirements grow rapidly as model size increases in parameters and layers. Moreover, the algorithm requires storing activation outputs and model parameters for the backward pass, which grows with model size.\nLarger models cannot fit and train on a single accelerator device like a GPU, and the memory footprint becomes prohibitive. Therefore, we need to parallelize model training across multiple devices to provide sufficient compute and memory to train state-of-the-art neural networks.\nAs shown in Figure 7.8, the two main approaches are data parallelism, which replicates the model across devices while splitting the input data batch-wise, and model parallelism, which partitions the model architecture itself across different devices. By training in parallel, we can leverage greater aggregate compute and memory resources to overcome system limitations and accelerate deep learning workloads.\n\n\n\n\n\n\nFigure 7.8: Data parallelism versus model parallelism.\n\n\n\n\n7.11.1 Data Parallel\nData parallelization is a common approach to parallelize machine learning training across multiple processing units, such as GPUs or distributed computing resources. The training dataset is divided into batches in data parallelism, and a separate processing unit processes each batch. The model parameters are then updated based on the gradients computed from the processing of each batch. Here’s a step-by-step description of data parallel parallelization for ML training:\n\nDividing the Dataset: The training dataset is divided into smaller batches, each containing a subset of the training examples.\nReplicating the Model: The neural network model is replicated across all processing units, and each processing unit has its copy of the model.\nParallel Computation: Each processing unit takes a different batch and independently computes the forward and backward passes. During the forward pass, the model makes predictions on the input data. The loss function calculates gradients for the model parameters during the backward pass.\nGradient Aggregation: After processing their respective batches, the gradients from each processing unit are aggregated. Common aggregation methods include summation or averaging of the gradients.\nParameter Update: The aggregated gradients update the model parameters. The update can be performed using optimization algorithms like SGD or variants like Adam.\nSynchronization: After the update, all processing units synchronize their model parameters, ensuring that each has the latest version of the model.\n\nThe prior steps are repeated for several iterations or until convergence.\nLet’s take a specific example. We have 256 batch sizes and 8 GPUs; each GPU will get a micro-batch of 32 samples. Their forward and backward passes compute losses and gradients only based on the local 32 samples. The gradients get aggregated across devices with a parameter server or collective communications library to get the effective gradient for the global batch. Weight updates happen independently on each GPU according to these gradients. After a configured number of iterations, updated weights synchronize and equalize across devices before continuing to the next iterations.\nData parallelism is effective when the model is large, and the dataset is substantial, as it allows for parallel processing of different parts of the data. It is widely used in deep learning frameworks and libraries that support distributed training, such as TensorFlow and PyTorch. However, to ensure efficient parallelization, care must be taken to handle issues like communication overhead, load balancing, and synchronization.\n\n\n7.11.2 Model Parallel\nModel parallelism refers to distributing the neural network model across multiple devices rather than replicating the full model like data parallelism. This is particularly useful when a model is too large to fit into the memory of a single GPU or accelerator device. While this might not be specifically applicable for embedded or TinyML use cases as most models are relatively small(er), it is still useful to know.\nIn model parallel training, different parts or layers of the model are assigned to separate devices. The input activations and intermediate outputs get partitioned and passed between these devices during the forward and backward passes to coordinate gradient computations across model partitions.\nThe memory footprint and computational operations are distributed by splitting the model architecture across multiple devices instead of concentrating on one. This enables training very large models with billions of parameters that otherwise exceed the capacity of a single device. There are several main ways in which we can do partitioning:\n\nLayer-wise parallelism: Consecutive layers are distributed onto different devices. For example, device 1 contains layers 1-3; device 2 contains layers 4-6. The output activations from layer 3 would be transferred to device 2 to start the next layers for the forward pass computations.\nFilter-wise parallelism: In convolutional layers, output filters can be split among devices. Each device computes activation outputs for a subset of filters, which get concatenated before propagating further.\nSpatial parallelism: The input images get divided spatially, so each device processes over a certain region like the top-left quarter of images. The output regions then combine to form the full output.\n\nAdditionally, hybrid combinations can split the model layer-wise and data batch-wise. The appropriate type of model parallelism depends on the specific neural architecture constraints and hardware setup. Optimizing the partitioning and communication for the model topology is key to minimizing overhead.\nHowever, as the model parts run on physically separate devices, they must communicate and synchronize their parameters during each training step. The backward pass must ensure gradient updates propagate accurately across the model partitions. Hence, coordination and high-speed interconnecting between devices are crucial for optimizing the performance of model parallel training. Careful partitioning and communication protocols are required to minimize transfer overhead.\n\n\n7.11.3 Comparison\nTo summarize, @tbl-parallelism demonstrates some of the key characteristics for comparing data parallelism and model parallelism:\n\n\n\nTable 7.2: Comparing data parallelism and model parallelism.\n\n\n\n\n\n\n\n\n\n\nCharacteristic\nData Parallelism\nModel Parallelism\n\n\n\n\nDefinition\nDistribute data across devices with model replicas\nDistribute model across devices\n\n\nObjective\nAccelerate training through compute scaling\nEnable larger model training\n\n\nScaling Method\nScale devices/workers\nScale model size\n\n\nMain Constraint\nModel size per device\nDevice coordination overhead\n\n\nHardware Requirements\nMultiple GPU/TPUs\nOften specialized interconnect\n\n\nPrimary Challenge\nParameter synchronization\nComplex partitioning + communication\n\n\nTypes\nN/A\nLayer-wise, filter-wise, spatial\n\n\nCode Complexity\nMinimal changes\nMore significant model surgery\n\n\nPopular Libraries\nHorovod, PyTorch Distributed\nMesh TensorFlow", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -803,7 +803,7 @@ "section": "7.12 Conclusion", "text": "7.12 Conclusion\nIn this chapter, we have covered the core foundations that enable effective training of artificial intelligence models. We explored the mathematical concepts like loss functions, backpropagation, and gradient descent that make neural network optimization possible. We also discussed practical techniques around leveraging training data, regularization, hyperparameter tuning, weight initialization, and distributed parallelization strategies that improve convergence, generalization, and scalability.\nThese methodologies form the bedrock through which the success of deep learning has been attained over the past decade. Mastering these fundamentals equips practitioners to architect systems and refine models tailored to their problem context. However, as models and datasets grow exponentially, training systems must optimize across metrics like time, cost, and carbon footprint. Hardware scaling through warehouse scales enables massive computational throughput - but optimizations around efficiency and specialization will be key. Software techniques like compression and sparsity exploitation can augment hardware gains. We will discuss several of these in the coming chapters.\nOverall, the fundamentals covered in this chapter equip practitioners to build, refine, and deploy models. However, interdisciplinary skills spanning theory, systems, and hardware will differentiate experts who can lift AI to the next level sustainably and responsibly that society requires. Understanding efficiency alongside accuracy constitutes the balanced engineering approach needed to train intelligent systems that integrate smoothly across many real-world contexts.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -812,9 +812,9 @@ "href": "contents/training/training.html#sec-ai-training-resource", "title": "7  AI Training", "section": "7.13 Resources", - "text": "7.13 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nThinking About Loss.\nMinimizing Loss.\nTraining, Validation, and Test Data.\nContinuous Training:\n\nRetraining Trigger.\nData Processing Overview.\nData Ingestion.\nData Validation.\nData Transformation.\nTraining with AutoML.\nContinuous Training with Transfer Learning.\nContinuous Training Use Case Metrics.\nContinuous Training Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 7.1\nExercise 7.2\nExercise 7.3\nExercise 7.4\nExercise 7.5\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "7.13 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nThinking About Loss.\nMinimizing Loss.\nTraining, Validation, and Test Data.\nContinuous Training:\n\nRetraining Trigger.\nData Processing Overview.\nData Ingestion.\nData Validation.\nData Transformation.\nTraining with AutoML.\nContinuous Training with Transfer Learning.\nContinuous Training Use Case Metrics.\nContinuous Training Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 7.1\nVideo 7.2\nVideo 7.3\nVideo 7.4\nVideo 7.5\nVideo 7.6\nVideo 7.7\nVideo 7.8\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 7.1\nExercise 7.2\nExercise 7.3\nExercise 7.4\nExercise 7.5\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Training", "7  AI Training" ] }, @@ -825,7 +825,7 @@ "section": "", "text": "8.1 Introduction\nTraining models can consume significant energy, sometimes equivalent to the carbon footprint of sizable industrial processes. We will cover some of these sustainability details in the AI Sustainability chapter. On the deployment side, if these models are not optimized for efficiency, they can quickly drain device batteries, demand excessive memory, or fall short of real-time processing needs. Through this introduction, we aim to elucidate the nuances of efficiency, setting the groundwork for a comprehensive exploration in the subsequent chapters.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -836,7 +836,7 @@ "section": "8.2 The Need for Efficient AI", "text": "8.2 The Need for Efficient AI\nEfficiency takes on different connotations depending on where AI computations occur. Let’s revisit and differentiate between Cloud, Edge, and TinyML in terms of efficiency. Figure 8.1 provides a big picture comparison of the three different platforms.\n\n\n\n\n\n\nFigure 8.1: Cloud, Mobile and TinyML. Credit: Schizas et al. (2022).\n\n\nSchizas, Nikolaos, Aristeidis Karras, Christos Karras, and Spyros Sioutas. 2022. “TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review.” Future Internet 14 (12): 363. https://doi.org/10.3390/fi14120363.\n\n\nFor cloud AI, traditional AI models often run in large-scale data centers equipped with powerful GPUs and TPUs (Barroso, Hölzle, and Ranganathan 2019). Here, efficiency pertains to optimizing computational resources, reducing costs, and ensuring timely data processing and return. However, relying on the cloud introduced latency, especially when dealing with large data streams that must be uploaded, processed, and downloaded.\n\nBarroso, Luiz André, Urs Hölzle, and Parthasarathy Ranganathan. 2019. The Datacenter as a Computer: Designing Warehouse-Scale Machines. Springer International Publishing. https://doi.org/10.1007/978-3-031-01761-2.\n\nLi, En, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. “Edge AI: On-demand Accelerating Deep Neural Network Inference via Edge Computing.” IEEE Trans. Wireless Commun. 19 (1): 447–57. https://doi.org/10.1109/twc.2019.2946140.\nFor edge AI, edge computing brings AI closer to the data source, processing information directly on local devices like smartphones, cameras, or industrial machines (Li et al. 2020). Here, efficiency encompasses quick real-time responses and reduced data transmission needs. The constraints, however, are tighter—these devices, while more powerful than microcontrollers, have limited computational power compared to cloud setups.\nPushing the frontier even further is TinyML, where AI models run on microcontrollers or extremely resource-constrained environments. The difference in processor and memory performance between TinyML and cloud or mobile systems can be several orders of magnitude (Warden and Situnayake 2019). Efficiency in TinyML is about ensuring models are lightweight enough to fit on these devices, use minimal energy (critical for battery-powered devices), and still perform their tasks effectively.\n\nWarden, Pete, and Daniel Situnayake. 2019. Tinyml: Machine Learning with Tensorflow Lite on Arduino and Ultra-Low-Power Microcontrollers. O’Reilly Media.\nThe spectrum from Cloud to TinyML represents a shift from vast, centralized computational resources to distributed, localized, and constrained environments. As we transition from one to the other, the challenges and strategies related to efficiency evolve, underlining the need for specialized approaches tailored to each scenario. Having underscored the need for efficient AI, especially within the context of TinyML, we will transition to exploring the methodologies devised to meet these challenges. The following sections outline the main concepts we will delve deeper into later. We will demonstrate the breadth and depth of innovation needed to achieve efficient AI as we delve into these strategies.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -847,7 +847,7 @@ "section": "8.3 Efficient Model Architectures", "text": "8.3 Efficient Model Architectures\nChoosing the right model architecture is as crucial as optimizing it. In recent years, researchers have explored some novel architectures that can have inherently fewer parameters while maintaining strong performance.\nMobileNets: MobileNets are efficient mobile and embedded vision application models (Howard et al. 2017). The key idea that led to their success is the use of depth-wise separable convolutions, which significantly reduce the number of parameters and computations in the network. MobileNetV2 and V3 further enhance this design by introducing inverted residuals and linear bottlenecks.\n\nHoward, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” ArXiv Preprint. https://arxiv.org/abs/1704.04861.\n\nIandola, Forrest N, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. “SqueezeNet: Alexnet-level Accuracy with 50x Fewer Parameters and 0.5 MB Model Size.” ArXiv Preprint abs/1602.07360. https://arxiv.org/abs/1602.07360.\nSqueezeNet: SqueezeNet is a class of ML models known for its smaller size without sacrificing accuracy. It achieves this by using a “fire module” that reduces the number of input channels to 3x3 filters, thus reducing the parameters (Iandola et al. 2016). Moreover, it employs delayed downsampling to increase the accuracy by maintaining a larger feature map.\nResNet variants: The Residual Network (ResNet) architecture allows for the introduction of skip connections or shortcuts (He et al. 2016). Some variants of ResNet are designed to be more efficient. For instance, ResNet-SE incorporates the “squeeze and excitation” mechanism to recalibrate feature maps (Hu, Shen, and Sun 2018), while ResNeXt offers grouped convolutions for efficiency (Xie et al. 2017).\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. IEEE. https://doi.org/10.1109/cvpr.2016.90.\n\nHu, Jie, Li Shen, and Gang Sun. 2018. “Squeeze-and-Excitation Networks.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132–41. IEEE. https://doi.org/10.1109/cvpr.2018.00745.\n\nXie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. 2017. “Aggregated Residual Transformations for Deep Neural Networks.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1492–1500. IEEE. https://doi.org/10.1109/cvpr.2017.634.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -858,7 +858,7 @@ "section": "8.4 Efficient Model Compression", "text": "8.4 Efficient Model Compression\nModel compression methods are very important for bringing deep learning models to devices with limited resources. These techniques reduce models’ size, energy consumption, and computational demands without significantly losing accuracy. At a high level, the methods can briefly be binned into the following fundamental methods:\nPruning: This is akin to trimming the branches of a tree. This was first thought of in the Optimal Brain Damage paper (LeCun, Denker, and Solla 1989). This was later popularized in the context of deep learning by Han, Mao, and Dally (2016). Certain weights or even entire neurons are removed from the network in pruning based on specific criteria. This can significantly reduce the model size. Various strategies include weight pruning, neuron pruning, and structured pruning. We will explore these in more detail in Section 9.2.1. Figure 8.2 is an examples of neural network pruning: removing some of the nodes in the inner layers (based on a specific criteria) reduces the numbers of edges between the nodes and, in turn, the size of the model.\n\nLeCun, Yann, John Denker, and Sara Solla. 1989. “Optimal Brain Damage.” Adv Neural Inf Process Syst 2.\n\nHan, Song, Huizi Mao, and William J. Dally. 2016. “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.” https://arxiv.org/abs/1510.00149.\n\n\n\n\n\n\nFigure 8.2: Neural Network Pruning.\n\n\n\nQuantization: quantization is the process of constraining an input from a large set to output in a smaller set, primarily in deep learning; this means reducing the number of bits that represent the weights and biases of the model. For example, using 16-bit or 8-bit representations instead of 32-bit can reduce the model size and speed up computations, with a minor trade-off in accuracy. We will explore these in more detail in Section 9.3.4. Figure 8.3 shows an example of quantization by rounding to the closest number. The conversion from 32-bit floating point to 16-bit reduces the memory usage by 50%. And going from 32-bit to 8-bit Integer, memory is reduced by 75%. While the loss in numeric precision, and consequently model performance, is minor, the memory usage efficiency is very significant.\n\n\n\n\n\n\nFigure 8.3: Different forms of quantization.\n\n\n\nKnowledge Distillation: Knowledge distillation involves training a smaller model (student) to replicate the behavior of a larger model (teacher). The idea is to transfer the knowledge from the cumbersome model to the lightweight one. Hence, the smaller model attains performance close to its larger counterpart but with significantly fewer parameters. We will explore knowledge distillation in more detail in the Section 9.2.2.1.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -869,7 +869,7 @@ "section": "8.5 Efficient Inference Hardware", "text": "8.5 Efficient Inference Hardware\nTraining: An AI model is an intensive task that requires powerful hardware and can take hours to weeks, but inference needs to be as fast as possible, especially in real-time applications. This is where efficient inference hardware comes into play. We can achieve rapid response times and power-efficient operation by optimizing the hardware specifically for inference tasks, which is especially crucial for edge devices and embedded systems.\nTPUs (Tensor Processing Units): TPUs are custom-built ASICs (Application-Specific Integrated Circuits) by Google to accelerate machine learning workloads (Jouppi et al. 2017). They are optimized for tensor operations, offering high throughput for low-precision arithmetic, and are designed specifically for neural network machine learning. TPUs significantly accelerate model training and inference compared to general-purpose GPU/CPUs. This boost means faster model training and real-time or near-real-time inference capabilities, which are crucial for applications like voice search and augmented reality.\n\nJouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et al. 2017. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” In Proceedings of the 44th Annual International Symposium on Computer Architecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\nEdge TPUs are a smaller, power-efficient version of Google’s TPUs tailored for edge devices. They provide fast on-device ML inferencing for TensorFlow Lite models. Edge TPUs allow for low-latency, high-efficiency inference on edge devices like smartphones, IoT devices, and embedded systems. AI capabilities can be deployed in real-time applications without communicating with a central server, thus saving bandwidth and reducing latency. Consider the table in Figure 8.4. It shows the performance differences between running different models on CPUs versus a Coral USB accelerator. The Coral USB accelerator is an accessory by Google’s Coral AI platform that lets developers connect Edge TPUs to Linux computers. Running inference on the Edge TPUs was 70 to 100 times faster than on CPUs.\n\n\n\n\n\n\nFigure 8.4: Accelerator vs CPU performance comparison. Credit: TensorFlow Blog.\n\n\n\nNN Accelerators: Fixed-function neural network accelerators are hardware accelerators designed explicitly for neural network computations. They can be standalone chips or part of a larger system-on-chip (SoC) solution. By optimizing the hardware for the specific operations that neural networks require, such as matrix multiplications and convolutions, NN accelerators can achieve faster inference times and lower power consumption than general-purpose CPUs and GPUs. They are especially beneficial in TinyML devices with power or thermal constraints, such as smartwatches, micro-drones, or robotics.\nBut these are all but the most common examples. A number of other types of hardware are emerging that have the potential to offer significant advantages for inference. These include, but are not limited to, neuromorphic hardware, photonic computing, etc. In Section 10.3, we will explore these in greater detail.\nEfficient hardware for inference speeds up the process, saves energy, extends battery life, and can operate in real-time conditions. As AI continues to be integrated into myriad applications- from smart cameras to voice assistants- the role of optimized hardware will only become more prominent. By leveraging these specialized hardware components, developers and engineers can bring the power of AI to devices and situations that were previously unthinkable.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -880,7 +880,7 @@ "section": "8.6 Efficient Numerics", "text": "8.6 Efficient Numerics\nMachine learning, and especially deep learning, involves enormous amounts of computation. Models can have millions to billions of parameters, often trained on vast datasets. Every operation, every multiplication or addition, demands computational resources. Therefore, the precision of the numbers used in these operations can significantly impact the computational speed, energy consumption, and memory requirements. This is where the concept of efficient numerics comes into play.\n\n8.6.1 Numerical Formats\nThere are many different types of numerics. Numerics have a long history in computing systems.\nFloating point: Known as single-precision floating-point, FP32 utilizes 32 bits to represent a number, incorporating its sign, exponent, and fraction. FP32 is widely adopted in many deep learning frameworks and balances accuracy and computational requirements. It’s prevalent in the training phase for many neural networks due to its sufficient precision in capturing minute details during weight updates.\nAlso known as half-precision floating point, FP16 uses 16 bits to represent a number, including its sign, exponent, and fraction. It offers a good balance between precision and memory savings. FP16 is particularly popular in deep learning training on GPUs that support mixed-precision arithmetic, combining the speed benefits of FP16 with the precision of FP32 where needed.\nSeveral other numerical formats fall into an exotic class. An exotic example is BF16 or Brain Floating Point. It is a 16-bit numerical format designed explicitly for deep learning applications. It’s a compromise between FP32 and FP16, retaining the 8-bit exponent from FP32 while reducing the mantissa to 7 bits (as compared to FP32’s 23-bit mantissa). This structure prioritizes range over precision. BF16 has achieved training results comparable in accuracy to FP32 while using significantly less memory and computational resources (Kalamkar et al. 2019). This makes it suitable not just for inference but also for training deep neural networks.\n\nKalamkar, Dhiraj, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, et al. 2019. “A Study of BFLOAT16 for Deep Learning Training.” https://arxiv.org/abs/1905.12322.\nBy retaining the 8-bit exponent of FP32, BF16 offers a similar range, which is crucial for deep learning tasks where certain operations can result in very large or very small numbers. At the same time, by truncating precision, BF16 allows for reduced memory and computational requirements compared to FP32. BF16 has emerged as a promising middle ground in the landscape of numerical formats for deep learning, providing an efficient and effective alternative to the more traditional FP32 and FP16 formats.\nFigure 8.5 shows three different floating-point formats: Float32, Float16, and BFloat16.\n\n\n\n\n\n\nFigure 8.5: Three floating-point formats.\n\n\n\nInteger: These are integer representations using 8, 4, and 2 bits. They are often used during the inference phase of neural networks, where the weights and activations of the model are quantized to these lower precisions. Integer representations are deterministic and offer significant speed and memory advantages over floating-point representations. For many inference tasks, especially on edge devices, the slight loss in accuracy due to quantization is often acceptable, given the efficiency gains. An extreme form of integer numerics is for binary neural networks (BNNs), where weights and activations are constrained to one of two values: +1 or -1.\nVariable bit widths: Beyond the standard widths, research is ongoing into extremely low bit-width numerics, even down to binary or ternary representations. Extremely low bit-width operations can offer significant speedups and further reduce power consumption. While challenges remain in maintaining model accuracy with such drastic quantization, advances continue to be made in this area.\nEfficient numerics is not just about reducing the bit-width of numbers but understanding the trade-offs between accuracy and efficiency. As machine learning models become more pervasive, especially in real-world, resource-constrained environments, the focus on efficient numerics will continue to grow. By thoughtfully selecting and leveraging the appropriate numeric precision, one can achieve robust model performance while optimizing for speed, memory, and energy. Table 8.1 summarizes these trade-offs.\n\n\n\nTable 8.1: Comparing precision levels in deep learning.\n\n\n\n\n\n\n\n\n\n\nPrecision\nPros\nCons\n\n\n\n\nFP32 (Floating Point 32-bit)\nStandard precision used in most deep learning frameworks. High accuracy due to ample representational capacity. Well-suited for training\nHigh memory usage. Slower inference times compared to quantized models. Higher energy consumption.\n\n\nFP16 (Floating Point 16-bit)\nReduces memory usage compared to FP32. Speeds up computations on hardware that supports FP16. Often used in mixed-precision training to balance speed and accuracy.\nLower representational capacity compared to FP32. Risk of numerical instability in some models or layers.\n\n\nINT8 (8-bit Integer)\nSignificantly reduced memory footprint compared to floating-point representations. Faster inference if hardware supports INT8 computations. Suitable for many post-training quantization scenarios.\nQuantization can lead to some accuracy loss. Requires careful calibration during quantization to minimize accuracy degradation.\n\n\nINT4 (4-bit Integer)\nEven lower memory usage than INT8.< br//> Further speedup potential for inference.\nHigher risk of accuracy loss compared to INT8. Calibration during quantization becomes more critical.\n\n\nBinary\nMinimal memory footprint (only 1 bit per parameter). Extremely fast inference due to bitwise operations. Power efficient.\nSignificant accuracy drop for many tasks. Complex training dynamics due to extreme quantization.\n\n\nTernary\nLow memory usage but slightly more than binary. Offers a middle ground between representation and efficiency.\naccuracy might still be lower than that of higher precision models. Training dynamics can be complex.\n\n\n\n\n\n\n\n\n8.6.2 Efficiency Benefits\nNumerical efficiency matters for machine learning workloads for several reasons:\nComputational Efficiency : High-precision computations (like FP32 or FP64) can be slow and resource-intensive. Reducing numeric precision can achieve faster computation times, especially on specialized hardware that supports lower precision.\nMemory Efficiency: Storage requirements decrease with reduced numeric precision. For instance, FP16 requires half the memory of FP32. This is crucial when deploying models to edge devices with limited memory or working with large models.\nPower Efficiency: Lower precision computations often consume less power, which is especially important for battery-operated devices.\nNoise Introduction: Interestingly, the noise introduced using lower precision can sometimes act as a regularizer, helping to prevent overfitting in some models.\nHardware Acceleration: Many modern AI accelerators and GPUs are optimized for lower precision operations, leveraging the efficiency benefits of such numerics.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -891,7 +891,7 @@ "section": "8.7 Evaluating Models", "text": "8.7 Evaluating Models\nIt’s worth noting that the actual benefits and trade-offs can vary based on the specific architecture of the neural network, the dataset, the task, and the hardware being used. Before deciding on a numeric precision, it’s advisable to perform experiments to evaluate the impact on the desired application.\n\n8.7.1 Efficiency Metrics\nA deep understanding of model evaluation methods is important to guide this process systematically. When assessing AI models’ effectiveness and suitability for various applications, efficiency metrics come to the forefront.\nFLOPs (Floating Point Operations) gauge a model’s computational demands. For instance, a modern neural network like BERT has billions of FLOPs, which might be manageable on a powerful cloud server but would be taxing on a smartphone. Higher FLOPs can lead to more prolonged inference times and significant power drain, especially on devices without specialized hardware accelerators. Hence, for real-time applications such as video streaming or gaming, models with lower FLOPs might be more desirable.\nMemory Usage pertains to how much storage the model requires, affecting both the deploying device’s storage and RAM. Consider deploying a model onto a smartphone: a model that occupies several gigabytes of space not only consumes precious storage but might also be slower due to the need to load large weights into memory. This becomes especially crucial for edge devices like security cameras or drones, where minimal memory footprints are vital for storage and rapid data processing.\nPower Consumption becomes especially crucial for devices that rely on batteries. For instance, a wearable health monitor using a power-hungry model could drain its battery in hours, rendering it impractical for continuous health monitoring. Optimizing models for low power consumption becomes essential as we move toward an era dominated by IoT devices, where many devices operate on battery power.\nInference Time is about how swiftly a model can produce results. In applications like autonomous driving, where split-second decisions are the difference between safety and calamity, models must operate rapidly. If a self-driving car’s model takes even a few seconds too long to recognize an obstacle, the consequences could be dire. Hence, ensuring a model’s inference time aligns with the real-time demands of its application is paramount.\nIn essence, these efficiency metrics are more than numbers dictating where and how a model can be effectively deployed. A model might boast high accuracy, but if its FLOPs, memory usage, power consumption, or inference time make it unsuitable for its intended platform or real-world scenarios, its practical utility becomes limited.\n\n\n8.7.2 Efficiency Comparisons\nThe ecosystem contains an abundance of models, each boasting its unique strengths and idiosyncrasies. However, pure model accuracy figures or training and inference speeds paint a partial picture. When we dive deeper into comparative analyses, several critical nuances emerge.\nOften, we encounter the delicate balance between accuracy and efficiency. For instance, while a dense, deep learning model and a lightweight MobileNet variant might excel in image classification, their computational demands could be at two extremes. This differentiation is especially pronounced when comparing deployments on resource-abundant cloud servers versus constrained TinyML devices. In many real-world scenarios, the marginal gains in accuracy could be overshadowed by the inefficiencies of a resource-intensive model.\nMoreover, the optimal model choice is only sometimes universal but often depends on the specifics of an application. Consider object detection: a model that excels in general scenarios that might falter in niche environments, such as when detecting manufacturing defects on a factory floor. This adaptability- or the lack of it- can dictate a model’s real-world utility.\nAnother important consideration is the relationship between model complexity and its practical benefits. Take voice-activated assistants, such as “Alexa” or “OK Google.” While a complex model might demonstrate a marginally superior understanding of user speech if it’s slower to respond than a simpler counterpart, the user experience could be compromised. Thus, adding layers or parameters only sometimes equates to better real-world outcomes.\nFurthermore, while benchmark datasets, such as ImageNet (Russakovsky et al. 2015), COCO (Lin et al. 2014), Visual Wake Words (Wang and Zhan 2019), Google Speech Commands (Warden 2018), etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, Figure 8.6 from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions.\n\nRussakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. “ImageNet Large Scale Visual Recognition Challenge.” Int. J. Comput. Vision 115 (3): 211–52. https://doi.org/10.1007/s11263-015-0816-y.\n\nLin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. “Microsoft Coco: Common Objects in Context.” In Computer VisionECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13, 740–55. Springer.\n\nWang, LingFeng, and YaQing Zhan. 2019. “A Conceptual Peer Review Model for arXiv and Other Preprint Databases.” Learn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\nWarden, Pete. 2018. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” arXiv Preprint arXiv:1804.03209.\n\n\n\n\n\n\nFigure 8.6: Different types of stoves. Credit: Dollar Street stove images.\n\n\n\nIn essence, a thorough comparative analysis transcends numerical metrics. It’s a holistic assessment intertwined with real-world applications, costs, and the intricate subtleties that each model brings to the table. This is why having standard benchmarks and metrics widely established and adopted by the community becomes important.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -902,7 +902,7 @@ "section": "8.8 Conclusion", "text": "8.8 Conclusion\nEfficient AI is extremely important as we push towards broader and more diverse real-world deployment of machine learning. This chapter provided an overview, exploring the various methodologies and considerations behind achieving efficient AI, starting with the fundamental need, similarities, and differences across cloud, Edge, and TinyML systems.\nWe saw that efficient model architectures can be useful for optimizations. Model compression techniques such as pruning, quantization, and knowledge distillation exist to help reduce computational demands and memory footprint without significantly impacting accuracy. Specialized hardware like TPUs and NN accelerators offer optimized silicon for neural network operations and data flow. Efficient numerics balance precision and efficiency, enabling models to attain robust performance using minimal resources. In the subsequent chapters, we will explore these different topics in depth and in detail.\nTogether, these form a holistic framework for efficient AI. But the journey doesn’t end here. Achieving optimally efficient intelligence requires continued research and innovation. As models become more sophisticated, datasets grow, and applications diversify into specialized domains, efficiency must evolve in lockstep. Measuring real-world impact requires nuanced benchmarks and standardized metrics beyond simplistic accuracy figures.\nMoreover, efficient AI expands beyond technological optimization and encompasses costs, environmental impact, and ethical considerations for the broader societal good. As AI permeates industries and daily lives, a comprehensive outlook on efficiency underpins its sustainable and responsible progress. The subsequent chapters will build upon these foundational concepts, providing actionable insights and hands-on best practices for developing and deploying efficient AI solutions.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -911,9 +911,9 @@ "href": "contents/efficient_ai/efficient_ai.html#sec-efficient-ai-resource", "title": "8  Efficient AI", "section": "8.9 Resources", - "text": "8.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nDeploying on Edge Devices: challenges and techniques.\nModel Evaluation.\nContinuous Evaluation Challenges for TinyML.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "8.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nDeploying on Edge Devices: challenges and techniques.\nModel Evaluation.\nContinuous Evaluation Challenges for TinyML.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Training", "8  Efficient AI" ] }, @@ -924,7 +924,7 @@ "section": "", "text": "9.1 Introduction\nWe have structured this chapter in three tiers. First, in Section 9.2 we examine the significance and methodologies of reducing the parameter complexity of models without compromising their inference capabilities. Techniques such as pruning and knowledge distillation are discussed, offering insights into how models can be compressed and simplified while maintaining, or even enhancing, their performance.\nGoing one level lower, in Section 9.3, we study the role of numerical precision in model computations and how altering it impacts model size, speed, and accuracy. We will examine the various numerical formats and how reduced-precision arithmetic can be leveraged to optimize models for embedded deployment.\nFinally, as we go lower and closer to the hardware, in Section 9.4, we will navigate through the landscape of hardware-software co-design, exploring how models can be optimized by tailoring them to the specific characteristics and capabilities of the target hardware. We will discuss how models can be adapted to exploit the available hardware resources effectively.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -935,7 +935,7 @@ "section": "", "text": "Figure 9.1: Three layers to be covered.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -946,7 +946,7 @@ "section": "9.2 Efficient Model Representation", "text": "9.2 Efficient Model Representation\nThe first avenue of attack for model optimization starts in familiar territory for most ML practitioners: efficient model representation is often first tackled at the highest level of parametrization abstraction - the model’s architecture itself.\nMost traditional ML practitioners design models with a general high-level objective in mind, whether it be image classification, person detection, or keyword spotting as mentioned previously in this textbook. Their designs generally end up naturally fitting into some soft constraints due to limited compute resources during development, but generally these designs are not aware of later constraints, such as those required if the model is to be deployed on a more constrained device instead of the cloud.\nIn this section, we’ll discuss how practitioners can harness principles of hardware-software co-design even at a model’s high level architecture to make their models compatible with edge devices. From most to least hardware aware at this level of modification, we discuss several of the most common strategies for efficient model parametrization: pruning, model compression, and edge-friendly model architectures.\n\n9.2.1 Pruning\n\nOverview\nModel pruning is a technique in machine learning that aims to reduce the size and complexity of a neural network model while maintaining its predictive capabilities as much as possible. The goal of model pruning is to remove redundant or non-essential components of the model, including connections between neurons, individual neurons, or even entire layers of the network.\nThis process typically involves analyzing the machine learning model to identify and remove weights, nodes, or layers that have little impact on the model’s outputs. By selectively pruning a model in this way, the total number of parameters can be reduced significantly without substantial declines in model accuracy. The resulting compressed model requires less memory and computational resources to train and run while enabling faster inference times.\nModel pruning is especially useful when deploying machine learning models to devices with limited compute resources, such as mobile phones or TinyML systems. The technique facilitates the deployment of larger, more complex models on these devices by reducing their resource demands. Additionally, smaller models require less data to generalize well and are less prone to overfitting. By providing an efficient way to simplify models, model pruning has become a vital technique for optimizing neural networks in machine learning.\nThere are several common pruning techniques used in machine learning, these include structured pruning, unstructured pruning, iterative pruning, bayesian pruning, and even random pruning. In addition to pruning the weights, one can also prune the activations. Activation pruning specifically targets neurons or filters that activate rarely or have overall low activation. There are numerous other methods, such as sensitivity and movement pruning. For a comprehensive list of methods, the reader is encouraged to read the following paper: “A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations” (2023).\nSo how does one choose the type of pruning methods? Many variations of pruning techniques exist where each varies the heuristic of what should be kept and pruned from the model as well the number of times pruning occurs. Traditionally, pruning happens after the model is fully trained, where the pruned model may experience mild accuracy loss. However, as we will discuss further, recent discoveries have found that pruning can be used during training (i.e., iteratively) to identify more efficient and accurate model representations.\n\n\nStructured Pruning\nWe start with structured pruning, a technique that reduces the size of a neural network by eliminating entire model-specific substructures while maintaining the overall model structure. It removes entire neurons/channels or layers based on importance criteria. For example, for a convolutional neural network (CNN), this could be certain filter instances or channels. For fully connected networks, this could be neurons themselves while maintaining full connectivity or even be elimination of entire model layers that are deemed to be insignificant. This type of pruning often leads to regular, structured sparse networks that are hardware friendly.\n\nComponents\nBest practices have started to emerge on how to think about structured pruning. There are three main components:\n\nStructures to target for pruning\nEstablishing a criteria for pruning\nSelecting a pruning strategy\n\n\n\nStructures to target for pruning\nGiven that there are different strategies, each of these structures (i.e., neurons, channels and layers) is pruned based on specific criteria and strategies, ensuring that the reduced model maintains as much of the predictive prowess of the original model as possible while gaining in computational efficiency and reduction in size.\nThe primary structures targeted for pruning include neurons, channels, and sometimes, entire layers, each having its unique implications and methodologies. When neurons are pruned, we are removing entire neurons along with their associated weights and biases, thereby reducing the width of the layer. This type of pruning is often utilized in fully connected layers.\nWith channel pruning, which is predominantly applied in convolutional neural networks (CNNs), it involves eliminating entire channels or filters, which in turn reduces the depth of the feature maps and impacts the network’s ability to extract certain features from the input data. This is particularly crucial in image processing tasks where computational efficiency is paramount.\nFinally, layer pruning takes a more aggressive approach by removing entire layers of the network. This significantly reduces the network’s depth and thereby its capacity to model complex patterns and hierarchies in the data. This approach necessitates a careful balance to ensure that the model’s predictive capability is not unduly compromised.\nFigure 9.2 demonstrates the difference between channel/filter wise pruning and layer pruning. When we prune a channel, we have to reconfigure the model’s architecture in order to adapt to the structural changes. One adjustment is changing the number of input channels in the subsequent layer (here, the third and deepest layer): changing the depths of the filters that are applied to the layer with the pruned channel. On the other hand, pruning an entire layer (removing all the channels in the layer) requires more drastic adjustements. The main one involves modifying the connections between the remaining layers to replace or bypass the pruned layer. In our case, we reconfigured had to connect the first and last layers. In all pruning cases, we have to fine-tune the new structure to adjust the weights.\n\n\n\n\n\n\nFigure 9.2: Channel vs layer pruning.\n\n\n\n\n\nEstablishing a criteria for pruning\nEstablishing well-defined criteria for determining which specific structures to prune from a neural network model is a crucial component of the model pruning process. The core goal here is to identify and remove components that contribute the least to the model’s predictive capabilities, while retaining structures integral to preserving the model’s accuracy.\nA widely adopted and effective strategy for systematically pruning structures relies on computing importance scores for individual components like neurons, filters, channels or layers. These scores serve as quantitative metrics to gauge the significance of each structure and its effect on the model’s output.\nThere are several techniques for assigning these importance scores:\n\nWeight magnitude-based pruning assigns scores based on the absolute values of the weights. Components with very small weights contribute minimally to activations and can be removed.\nGradient-based pruning utilizes the gradients of the loss function with respect to each weight to determine sensitivity. Weights with low gradient magnitudes when altered have little effect on the loss and can be pruned.\nActivation-based pruning tracks activation values for neurons/filters over a validation dataset. Consistently low activation values suggest less relevance, warranting removal.\nTaylor expansion approximates the change in loss function from removing a given weight. Weights with negligible impact on loss are prime candidates for pruning.\n\nThe idea is to measure, either directly or indirectly, the contribution of each component to the model’s output. Structures with minimal influence according to the defined criteria are pruned first. This enables selective, optimized pruning that maximally compresses models while preserving predictive capacity. In general, it is important to evaluate the impact of removing particular structures on the model’s output.\n\n\nSelecting a pruning strategy\nThe pruning strategy orchestrates how structures are removed and integrates with subsequent model fine-tuning to recover predictive performance. Two main structured pruning strategies exist: iterative pruning and one-shot pruning.\nIterative pruning gradually removes structures across multiple cycles of pruning followed by fine-tuning. In each cycle, a small set of structures are pruned based on importance criteria. The model is then fine-tuned, allowing it to adjust smoothly to the structural changes before the next pruning iteration. This gradual, cyclic approach prevents abrupt accuracy drops. It allows the model to slowly adapt as structures are reduced across iterations.\nConsider a situation where we wish to prune the 6 least effective channels (based on some specific critera) from a convolutional neural network. In Figure 9.3, we show a simplified pruning process carried over 3 iterations. In every iteration, we only prune 2 channels. Removing the channels results in accuracy degradation. In the first iteration, the accuracy drops from 0.995 to 0.971. However, after we fine-tune the model on the new structure, we are able to recover from the performance loss, bringing the accuracy up to 0.992. Since the structural changes are minor and gradual, the network can more easily adapt to them. Running the same process 2 more times, we end up with a final accuracy of 0.991 (a loss of only 0.4% from the original) and 27% decrease in the number of channels. Thus, iterative pruning enables us to maintain performance while benefiting from increased computational efficiency due to the decreased model size.\n\n\n\n\n\n\nFigure 9.3: Iterative pruning.\n\n\n\nOne-shot pruning takes a more aggressive approach by pruning a large portion of structures simultaneously in one shot based on predefined importance criteria. This is followed by extensive fine-tuning to recover model accuracy. While faster, this aggressive strategy can degrade accuracy if the model cannot recover during fine-tuning.\nThe choice between these strategies involves weighing factors like model size, target sparsity level, available compute and acceptable accuracy losses. One-shot pruning can rapidly compress models, but iterative pruning may enable better accuracy retention for a target level of pruning. In practice, the strategy is tailored based on use case constraints. The overarching aim is to generate an optimal strategy that removes redundancy, achieves efficiency gains through pruning, and finely tunes the model to stabilize accuracy at an acceptable level for deployment.\nNow consider the same network we had in the iterative pruning example. Whereas in the iterative process we pruned 2 channels at a time, in the one-shot pruning we would prune the 6 channels at once (Figure 9.4). Removing 27% of the network’s channel simultaneously alters the structure significantly, causing the accuracy to drop from 0.995 to 0.914. Given the major changes, the network is not able to properly adapt during fine-tuning, and the accuracy went up to 0.943, a 5% degradation from the accuracy of the unpruned network. While the final structures in both iterative pruning and oneshot pruning processes are identical, the former is able to maintain high performance while the latter suffers significant degradations.\n\n\n\n\n\n\nFigure 9.4: One-shot pruning.\n\n\n\n\n\n\nAdvantages of Structured Pruning\nStructured pruning brings forth a myriad of advantages that cater to various facets of model deployment and utilization, especially in environments where computational resources are constrained.\n\nComputational Efficiency\nBy eliminating entire structures, such as neurons or channels, structured pruning significantly diminishes the computational load during both training and inference phases, thereby enabling faster model predictions and training convergence. Moreover, the removal of structures inherently reduces the model’s memory footprint, ensuring that it demands less storage and memory during operation, which is particularly beneficial in memory-constrained environments like TinyML systems.\n\n\nHardware Efficiency\nStructured pruning often results in models that are more amenable to deployment on specialized hardware, such as Field-Programmable Gate Arrays (FPGAs) or Application-Specific Integrated Circuits (ASICs), due to the regularity and simplicity of the pruned architecture. With reduced computational requirements, it translates to lower energy consumption, which is crucial for battery-powered devices and sustainable computing practices.\n\n\nMaintenance and Deployment\nThe pruned model, while smaller, retains its original architectural form, which can simplify the deployment pipeline and ensure compatibility with existing systems and frameworks. Also, with fewer parameters and simpler structures, the pruned model becomes easier to manage and monitor in production environments, potentially reducing the overhead associated with model maintenance and updates. Later on, when we dive into MLOps, this need will become apparent.\n\n\n\nUnstructured Pruning\nUnstructured pruning is, as its name suggests, pruning the model without regard to model-specific substructure. As mentioned above, it offers a greater aggression in pruning and can achieve higher model sparsities while maintaining accuracy given less constraints on what can and can’t be pruned. Generally, post-training unstructured pruning consists of an importance criterion for individual model parameters/weights, pruning/removal of weights that fall below the criteria, and optional fine-tuning after to try and recover the accuracy lost during weight removal.\nUnstructured pruning has some advantages over structured pruning: removing individual weights instead of entire model substructures often leads in practice to lower model accuracy decreases. Furthermore, generally determining the criterion of importance for an individual weight is much simpler than for an entire substructure of parameters in structured pruning, making the former preferable for cases where that overhead is hard or unclear to compute. Similarly, the actual process of structured pruning is generally less flexible, as removing individual weights is generally simpler than removing entire substructures and ensuring the model still works.\nUnstructured pruning, while offering the potential for significant model size reduction and enhanced deployability, brings with it challenges related to managing sparse representations and ensuring computational efficiency. It is particularly useful in scenarios where achieving the highest possible model compression is paramount and where the deployment environment can handle sparse computations efficiently.\nTable 9.1 provides a concise comparison between structured and unstructured pruning. In this table, aspects related to the nature and architecture of the pruned model (Definition, Model Regularity, and Compression Level) are grouped together, followed by aspects related to computational considerations (Computational Efficiency and Hardware Compatibility), and ending with aspects related to the implementation and adaptation of the pruned model (Implementation Complexity and Fine-Tuning Complexity). Both pruning strategies offer unique advantages and challenges, as shown in Table 9.1, and the selection between them should be influenced by specific project and deployment requirements.\n\n\n\nTable 9.1: Comparison of structured versus unstructured pruning.\n\n\n\n\n\n\n\n\n\n\nAspect\nStructured Pruning\nUnstructured Pruning\n\n\n\n\nDefinition\nPruning entire structures (e.g., neurons, channels, layers) within the network\nPruning individual weights or neurons, resulting in sparse matrices or non-regular network structures\n\n\nModel Regularity\nMaintains a regular, structured network architecture\nResults in irregular, sparse network architectures\n\n\nCompression Level\nMay offer limited model compression compared to unstructured pruning\nCan achieve higher model compression due to fine-grained pruning\n\n\nComputational Efficiency\nTypically more computationally efficient due to maintaining regular structures\nCan be computationally inefficient due to sparse weight matrices, unless specialized hardware/software is used\n\n\nHardware Compatibility\nGenerally better compatible with various hardware due to regular structures\nMay require hardware that efficiently handles sparse computations to realize benefits\n\n\nImplementation Complexity\nOften simpler to implement and manage due to maintaining network structure\nCan be complex to manage and compute due to sparse representations\n\n\nFine-Tuning Complexity\nMay require less complex fine-tuning strategies post-pruning\nMight necessitate more complex retraining or fine-tuning strategies post-pruning\n\n\n\n\n\n\nIn Figure 9.5 we have exapmles that illustrate the differences between unstructured and structured pruning. Observe that unstructured pruning can lead to models that no longer obey high-level structural guaruntees of their original unpruned counterparts: the left network is no longer a fully connected network after pruning. Structured pruning on the other hand maintains those invariants: in the middle, the fully connected network is pruned in a way that the pruned network is still fully connected; likewise, the CNN maintains its convolutional structure, albeit with fewer filters.\n\n\n\n\n\n\nFigure 9.5: Unstructured vs structured pruning. Credit: Qi et al. (2021).\n\n\nQi, Chen, Shibo Shen, Rongpeng Li, Zhifeng Zhao, Qing Liu, Jing Liang, and Honggang Zhang. 2021. “An Efficient Pruning Scheme of Deep Neural Networks for Internet of Things Applications.” EURASIP Journal on Advances in Signal Processing 2021 (1). https://doi.org/10.1186/s13634-021-00744-4.\n\n\n\n\nLottery Ticket Hypothesis\nPruning has evolved from a purely post-training technique that came at the cost of some accuracy, to a powerful meta-learning approach applied during training to reduce model complexity. This advancement in turn improves compute, memory, and latency efficiency at both training and inference.\nA breakthrough finding that catalyzed this evolution was the lottery ticket hypothesis by Frankle and Carbin (2019). They empirically discovered by Jonathan Frankle and Michael Carbin. Their work states that within dense neural networks, there exist sparse subnetworks, referred to as “winning tickets,” that can match or even exceed the performance of the original model when trained in isolation. Specifically, these winning tickets, when initialized using the same weights as the original network, can achieve similarly high training convergence and accuracy on a given task. It is worthwhile pointing out that they empirically discovered the lottery ticket hypothesis, which was later formalized.\n\nFrankle, Jonathan, and Michael Carbin. 2019. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks.” In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJl-b3RcF7.\nThe intuition behind this hypothesis is that, during the training process of a neural network, many neurons and connections become redundant or unimportant, particularly with the inclusion of training techniques encouraging redundancy like dropout. Identifying, pruning out, and initializing these “winning tickets’’ allows for faster training and more efficient models, as they contain the essential model decision information for the task. Furthermore, as generally known with the bias-variance tradeoff theory, these tickets suffer less from overparameterization and thus generalize better rather than overfitting to the task.\nIn Figure 9.6 we have an example experiment showing pruning and training experiments on a fully connected LeNet over a variety of pruning ratios. In the left plot, notice how heavy pruning reveals a more efifcient subnetwork (in green) that is 21.1% the size of the original network (in blue), The subnetwork achieves higher accuracy and in a faster manner than the unpruned version (green line is above the blue line). However, pruning has a limit (sweet spot), and further pruning will produce performance degredations and eventually drop below the unpruned version’s performance (notice how the red, purple, and brown subnetworks gradually drop in accuracy performance) due to the significant loss in the number of parameters.\n\n\n\n\n\n\nFigure 9.6: Lottery ticket hypothesis experiments.\n\n\n\nThe following is the process of finding the winning lottery ticket subnetwork, as also shown in Figure 9.7 (left side):\n1- Initialize the network’s weights to random values.\n2- Train the network until it converges to the desired performance.\n3- Prune out some percentage of the edges with the lowest weight values.\n4- Reinitialize the network with the same random values from step 1.\n5- Repeat steps 2-4 for a number of times, or as long as the accuracy doesn’t significantly degrade.\nWhen we finish, we are left with a pruned network (Figure 9.7 right side), which is a subnetwork of the one we start with. The subnetwork should have a significantly smaller structure, while maintaining a comparable level of accuracy.\n\n\n\n\n\n\nFigure 9.7: Finding the winning ticket subnetwork.\n\n\n\n\n\nChallenges & Limitations\nThere is no free lunch with pruning optimizations, with some choices coming with both improvements and costs to considers. Below we discuss some tradeoffs for practitioners to consider.\n\nQuality vs. Size Reduction\nA key challenge in both structured and unstructured pruning is balancing size reduction with maintaining or improving predictive performance. This trade-off becomes more complex with unstructured pruning, where individual weight removal can create sparse weight matrices. Ensuring the pruned model retains generalization capacity while becoming more computationally efficient is critical, often requiring extensive experimentation and validation.\n\n\nDetermining Pruning Criteria\nEstablishing a robust pruning criteria, whether for removing entire structures (structured pruning) or individual weights (unstructured pruning), is challenging. The criteria must accurately identify elements whose removal minimally impacts performance. For unstructured pruning, this might involve additional complexities due to the potential for generating sparse weight matrices, which can be computationally inefficient on certain hardware.\n\n\nFine-Tuning and Retraining\nPost-pruning fine-tuning is imperative in both structured and unstructured pruning to recover lost performance and stabilize the model. The challenge encompasses determining the extent, duration, and nature of the fine-tuning process, which can be influenced by the pruning method and the degree of pruning applied.\n\n\nScalability of Pruning Strategies\nEnsuring that pruning strategies, whether structured or unstructured, are scalable and applicable across various models and domains is challenging. Unstructured pruning might introduce additional challenges related to managing and deploying models with sparse weight matrices, especially in hardware that is not optimized for sparse computations.\n\n\nHardware Compatibility and Efficiency\nEspecially pertinent to unstructured pruning, hardware compatibility and efficiency become critical. Unstructured pruning often results in sparse weight matrices, which may not be efficiently handled by certain hardware, potentially negating the computational benefits of pruning (see Figure 9.8). Ensuring that pruned models, particularly those resulting from unstructured pruning, are compatible and efficient on the target hardware is a significant consideration.\n\n\nComplexity in Implementing Pruning Algorithms\nUnstructured pruning might introduce additional complexity in implementing pruning algorithms due to the need to manage sparse representations of weights. Developing or adapting algorithms that can efficiently handle, store, and compute sparse weight matrices is an additional challenge and consideration in unstructured pruning.\n\n\nLegal and Ethical Considerations\nLast but not least, adherence to legal and ethical guidelines is paramount, especially in domains with significant consequences. Both pruning methods must undergo rigorous validation, testing, and potentially certification processes to ensure compliance with relevant regulations and standards. This is especially important in use cases like medical AI applications or autonomous driving where quality drops due to pruning like optimizations can be life threatening.\n\n\n\n\n\n\nFigure 9.8: Sparse weight matrix.\n\n\n\n\n\n\n\n\n\nExercise 9.1: Pruning\n\n\n\n\n\nImagine your neural network is a giant, overgrown bush. Pruning is like strategically trimming away branches to make it stronger and more efficient! In the Colab, you’ll learn how to do this trimming in TensorFlow. Understanding these concepts will give you the foundation to see how pruning makes models small enough to run on your phone!\n\n\n\n\n\n\n\n\n9.2.2 Model Compression\nModel compression techniques are crucial for deploying deep learning models on resource-constrained devices. These techniques aim to create smaller, more efficient models that preserve the predictive performance of the original models.\n\nKnowledge Distillation\nOne popular technique is knowledge distillation (KD), which transfers knowledge from a large, complex “teacher” model to a smaller “student” model. The key idea is to train the student model to mimic the teacher’s outputs. The concept of KD was first popularized by Hinton (2005).\n\nHinton, Geoffrey. 2005. “Van Nostrand’s Scientific Encyclopedia.” Wiley. https://doi.org/10.1002/0471743984.vse0673.\n\nOverview and Benefits\nAt its core, KD strategically leverages the refined outputs of a pre-trained teacher model to transfer knowledge to a smaller student model. The key technique is using “soft targets” derived from the teacher’s probabilistic predictions. Specifically, the teacher’s outputs are passed through a temperature-scaled softmax function, yielding softened probability distributions over classes. This softening provides richer supervision signals for the student model compared to hard target labels.\nThe loss function is another critical component that typically amalgamates a distillation loss, which measures the divergence between the teacher and student outputs, and a classification loss, which ensures the student model adheres to the true data labels. The Kullback-Leibler (KL) divergence is commonly employed to quantify the distillation loss, providing a measure of the discrepancy between the probability distributions output by the teacher and student models.\nAnother core concept is “temperature scaling” in the softmax function. It plays the role of controlling the granularity of the information distilled from the teacher model. A higher temperature parameter produces softer, more informative distributions, thereby facilitating the transfer of more nuanced knowledge to the student model. However, it also introduces the challenge of effectively balancing the trade-off between the informativeness of the soft targets and the stability of the training process.\nThese components, when adeptly configured and harmonized, enable the student model to assimilate the teacher model’s knowledge, crafting a pathway towards efficient and robust smaller models that retain the predictive prowess of their larger counterparts. Figure 9.9 visualizes the training procedure of knowledge distillation. Note how the logits or soft labels of the teacher model are used to provide a distillation loss for the student model to learn from.\n\n\n\n\n\n\nFigure 9.9: Knowledge distillation training process. Credit: IntelLabs (2023).\n\n\nIntelLabs. 2023. “Knowledge Distillation - Neural Network Distiller.” https://intellabs.github.io/distiller/knowledge_distillation.html.\n\n\n\n\nChallenges\nHowever, KD has a unique set of challenges and considerations that researchers and practitioners must attentively address. One of the challenges is in the meticulous tuning of hyperparameters, such as the temperature parameter in the softmax function and the weighting between the distillation and classification loss in the objective function. Striking a balance that effectively leverages the softened outputs of the teacher model while maintaining fidelity to the true data labels is non-trivial and can significantly impact the student model’s performance and generalization capabilities.\nFurthermore, the architecture of the student model itself poses a considerable challenge. Designing a model that is compact to meet computational and memory constraints, while still being capable of assimilating the essential knowledge from the teacher model, demands a nuanced understanding of model capacity and the inherent trade-offs involved in compression. The student model must be carefully architected to navigate the dichotomy of size and performance, ensuring that the distilled knowledge is meaningfully captured and utilized. Moreover, the choice of teacher model, which inherently influences the quality and nature of the knowledge to be transferred, is important and it introduces an added layer of complexity to the KD process.\nThese challenges underscore the necessity for a thorough and nuanced approach to implementing KD, ensuring that the resultant student models are both efficient and effective in their operational contexts.\n\n\n\nLow-rank Matrix Factorization\nSimilar in approximation theme, low-rank matrix factorization (LRMF) is a mathematical technique used in linear algebra and data analysis to approximate a given matrix by decomposing it into two or more lower-dimensional matrices. The fundamental idea is to express a high-dimensional matrix as a product of lower-rank matrices, which can help reduce the complexity of data while preserving its essential structure. Mathematically, given a matrix \\(A \\in \\mathbb{R}^{m \\times n}\\), LRMF seeks matrices \\(U \\in \\mathbb{R}^{m \\times k}\\) and \\(V \\in \\mathbb{R}^{k \\times n}\\) such that \\(A \\approx UV\\), where \\(k\\) is the rank and is typically much smaller than \\(m\\) and \\(n\\).\n\nBackground and Benefits\nOne of the seminal works in the realm of matrix factorization, particularly in the context of recommendation systems, is the paper by Koren, Bell, and Volinsky (2009). The authors delve into various factorization models, providing insights into their efficacy in capturing the underlying patterns in the data and enhancing predictive accuracy in collaborative filtering. LRMF has been widely applied in recommendation systems (such as Netflix, Facebook, etc.), where the user-item interaction matrix is factorized to capture latent factors corresponding to user preferences and item attributes.\n\nKoren, Yehuda, Robert Bell, and Chris Volinsky. 2009. “Matrix Factorization Techniques for Recommender Systems.” Computer 42 (8): 30–37. https://doi.org/10.1109/mc.2009.263.\nThe main advantage of low-rank matrix factorization lies in its ability to reduce data dimensionality as shown in Figure 9.10, where there are fewer parameters to store, making it computationally more efficient and reducing storage requirements at the cost of some additional compute. This can lead to faster computations and more compact data representations, which is especially valuable when dealing with large datasets. Additionally, it may aid in noise reduction and can reveal underlying patterns and relationships in the data.\nFigure 9.10 illustrates the decrease in parameterization enabled by low-rank matrix factorization. Observe how the matrix \\(M\\) can be approximated by the product of matrices \\(L_k\\) and \\(R_k^T\\). For intuition, most fully connected layers in networks are stored as a projection matrix \\(M\\), which requires \\(m \\times n\\) parameter to be loaded on computation. However, by decomposing and approximating it as the product of two lower rank matrices, we thus only need to store \\(m \\times k + k\\times n\\) parameters in terms of storage while incurring an additional compute cost of the matrix multiplication. So long as \\(k < n/2\\), this factorization has fewer parameters total to store while adding a computation of runtime \\(O(mkn)\\) (Gu (2023)).\n\nGu, Ivy. 2023. “Deep Learning Model Compression (Ii) by Ivy Gu Medium.” https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453.\n\n\n\n\n\n\nFigure 9.10: Low matrix factorization. Credit: The Clever Machine.\n\n\n\n\n\nChallenges\nBut practitioners and researchers encounter a spectrum of challenges and considerations that necessitate careful attention and strategic approaches. As with any lossy compression technique, we may lose information during this approximation process: choosing the correct rank that balances the information lost and the computational costs is tricky as well and adds an additional hyper-parameter to tune for.\nLow-rank matrix factorization is a valuable tool for dimensionality reduction and making compute fit onto edge devices but, like other techniques, needs to be carefully tuned to the model and task at hand. A key challenge resides in managing the computational complexity inherent to LRMF, especially when grappling with high-dimensional and large-scale data. The computational burden, particularly in the context of real-time applications and massive datasets, remains a significant hurdle for effectively using LRMF.\nMoreover, the conundrum of choosing the optimal rank, (k), for the factorization introduces another layer of complexity. The selection of (k) inherently involves a trade-off between approximation accuracy and model simplicity, and identifying a rank that adeptly balances these conflicting objectives often demands a combination of domain expertise, empirical validation, and sometimes, heuristic approaches. The challenge is further amplified when the data encompasses noise or when the inherent low-rank structure is not pronounced, making the determination of a suitable (k) even more elusive.\nHandling missing or sparse data, a common occurrence in applications like recommendation systems, poses another substantial challenge. Traditional matrix factorization techniques, such as Singular Value Decomposition (SVD), are not directly applicable to matrices with missing entries, necessitating the development and application of specialized algorithms that can factorize incomplete matrices while mitigating the risks of overfitting to the observed entries. This often involves incorporating regularization terms or constraining the factorization in specific ways, which in turn introduces additional hyperparameters that need to be judiciously selected.\nFurthermore, in scenarios where data evolves or grows over time, developing LRMF models that can adapt to new data without necessitating a complete re-factorization is a critical yet challenging endeavor. Online and incremental matrix factorization algorithms seek to address this by enabling the update of factorized matrices as new data arrives, yet ensuring stability, accuracy, and computational efficiency in these dynamic settings remains an intricate task. This is particularly challenging in the space of TinyML, where edge redeployment for refreshed models can be quite challenging.\n\n\n\nTensor Decomposition\nSimilar to low-rank matrix factorization, more complex models may store weights in higher dimensions, such as tensors: tensor decomposition is the higher-dimensional analogue of matrix factorization, where a model tensor is decomposed into lower rank components (see Figure 9.11), which again are easier to compute on and store but may suffer from the same issues as mentioned above of information loss and nuanced hyperparameter tuning. Mathematically, given a tensor \\(\\mathcal{A}\\), tensor decomposition seeks to represent \\(\\mathcal{A}\\) as a combination of simpler tensors, facilitating a compressed representation that approximates the original data while minimizing the loss of information.\nThe work of Tamara G. Kolda and Brett W. Bader, “Tensor Decompositions and Applications” (2009), stands out as a seminal paper in the field of tensor decompositions. The authors provide a comprehensive overview of various tensor decomposition methods, exploring their mathematical underpinnings, algorithms, and a wide array of applications, ranging from signal processing to data mining. Of course, the reason we are discussing it is because it has huge potential for system performance improvements, particularly in the space of TinyML, where throughput and memory footprint savings are crucial to feasibility of deployments.\n\n\n\n\n\n\nFigure 9.11: Tensor decomposition. Credit: Xinyu (n.d.).\n\n\nXinyu, Chen. n.d.\n\n\n\n\n\n\n\n\nExercise 9.2: Scalable Model Compression with TensorFlow\n\n\n\n\n\nThis Colab dives into a technique for compressing models while maintaining high accuracy. The key idea is to train a model with an extra penalty term that encourages the model to be more compressible. Then, the model is encoded using a special coding scheme that aligns with this penalty. This approach allows you to achieve compressed models that perform just as well as the original models and is useful in deploying models to devices with limited resources like mobile phones and edge devices.\n\n\n\n\n\n\n\n9.2.3 Edge-Aware Model Design\nFinally, we reach the other end of the hardware-software gradient, where we specifically make model architecture decisions directly given knowledge of the edge devices we wish to deploy on.\nAs covered in previous sections, edge devices are constrained specifically with limitations on memory and parallelizable computations: as such, if there are critical inference speed requirements, computations must be flexible enough to satisfy hardware constraints, something that can be designed at the model architecture level. Furthermore, trying to cram SOTA large ML models onto edge devices even after pruning and compression is generally infeasible purely due to size: the model complexity itself must be chosen with more nuance as to more feasibly fit the device. Edge ML developers have approached this architectural challenge both through designing bespoke edge ML model architectures and through device-aware neural architecture search (NAS), which can more systematically generate feasible on-device model architectures.\n\nModel Design Techniques\nOne edge friendly architecture design is depthwise separable convolutions. Commonly used in deep learning for image processing, it consists of two distinct steps: the first is the depthwise convolution, where each input channel is convolved independently with its own set of learnable filters, as show in Figure 9.12. This step reduces computational complexity by a significant margin compared to standard convolutions, as it drastically reduces the number of parameters and computations involved. The second step is the pointwise convolution, which combines the output of the depthwise convolution channels through a 1x1 convolution, creating inter-channel interactions. This approach offers several advantages. Pros include reduced model size, faster inference times, and often better generalization due to fewer parameters, making it suitable for mobile and embedded applications. However, depthwise separable convolutions may not capture complex spatial interactions as effectively as standard convolutions and might require more depth (layers) to achieve the same level of representational power, potentially leading to longer training times. Nonetheless, their efficiency in terms of parameters and computation makes them a popular choice in modern convolutional neural network architectures.\n\n\n\n\n\n\nFigure 9.12: Depthwise separable convolutions. Credit: Hegde (2023).\n\n\nHegde, Sumant. 2023. “An Introduction to Separable Convolutions - Analytics Vidhya.” https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/.\n\n\n\n\nExample Model Architectures\nIn this vein, a number of recent architectures have been, from inception, specifically designed for maximizing accuracy on an edge deployment, notably SqueezeNet, MobileNet, and EfficientNet.\n\nSqueezeNet by Iandola et al. (2016) for instance, utilizes a compact architecture with 1x1 convolutions and fire modules to minimize the number of parameters while maintaining strong accuracy.\nMobileNet by Howard et al. (2017), on the other hand, employs the aforementioned depthwise separable convolutions to reduce both computation and model size.\nEfficientNet by Tan and Le (2023) takes a different approach by optimizing network scaling (i.e. varying the depth, width and resolution of a network) and compound scaling, a more nuanced variation network scaling, to achieve superior performance with fewer parameters.\n\n\nIandola, Forrest N, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. “SqueezeNet: Alexnet-level Accuracy with 50x Fewer Parameters and 0.5 MB Model Size.” ArXiv Preprint abs/1602.07360. https://arxiv.org/abs/1602.07360.\n\nHoward, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” ArXiv Preprint. https://arxiv.org/abs/1704.04861.\n\nTan, Mingxing, and Quoc V. Le. 2023. “Demystifying Deep Learning.” Wiley. https://doi.org/10.1002/9781394205639.ch6.\nThese models are essential in the context of edge computing where limited processing power and memory require lightweight yet effective models that can efficiently perform tasks such as image recognition, object detection, and more. Their design principles showcase the importance of intentionally tailored model architecture for edge computing, where performance and efficiency must fit within constraints.\n\n\nStreamlining Model Architecture Search\nFinally, systematized pipelines for searching for performant edge-compatible model architectures are possible through frameworks like TinyNAS by J. Lin et al. (2020) and MorphNet by Gordon et al. (2018).\n\nGordon, Ariel, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. 2018. “MorphNet: Fast &Amp; Simple Resource-Constrained Structure Learning of Deep Networks.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1586–95. IEEE. https://doi.org/10.1109/cvpr.2018.00171.\nTinyNAS is an innovative neural architecture search framework introduced in the MCUNet paper, designed to efficiently discover lightweight neural network architectures for edge devices with limited computational resources. Leveraging reinforcement learning and a compact search space of micro neural modules, TinyNAS optimizes for both accuracy and latency, enabling the deployment of deep learning models on microcontrollers, IoT devices, and other resource-constrained platforms. Specifically, TinyNAS, in conjunction with a network optimizer TinyEngine, generates different search spaces by scaling the input resolution and the model width of a model, then collects the computation FLOPs distribution of satisfying networks within the search space to evaluate its priority. TinyNAS relies on the assumption that a search space that accommodates higher FLOPs under memory constraint can produce higher accuracy models, something that the authors verified in practice in their work. In empirical performance, TinyEngine reduced the peak memory usage of models by around 3.4 times and accelerated inference by 1.7 to 3.3 times compared to TFLite and CMSIS-NN.\nSimilarly, MorphNet is a neural network optimization framework designed to automatically reshape and morph the architecture of deep neural networks, optimizing them for specific deployment requirements. It achieves this through two steps: first, it leverages a set of customizable network morphing operations, such as widening or deepening layers, to dynamically adjust the network’s structure. These operations enable the network to adapt to various computational constraints, including model size, latency, and accuracy targets, which are extremely prevalent in edge computing usage. In the second step, MorphNet uses a reinforcement learning-based approach to search for the optimal permutation of morphing operations, effectively balancing the trade-off between model size and performance. This innovative method allows deep learning practitioners to automatically tailor neural network architectures to specific application and hardware requirements, ensuring efficient and effective deployment across various platforms.\nTinyNAS and MorphNet represent a few of the many significant advancements in the field of systematic neural network optimization, allowing architectures to be systematically chosen and generated to fit perfectly within problem constraints.\n\n\n\n\n\n\nExercise 9.3: Edge-Aware Model Design\n\n\n\n\n\nImagine you’re building a tiny robot that can identify different flowers. It needs to be smart, but also small and energy-efficient! In the “Edge-Aware Model Design” world, we learned about techniques like depthwise separable convolutions and architectures like SqueezeNet, MobileNet, and EfficientNet – all designed to pack intelligence into compact models. Now, let’s see these ideas in action with some xColabs:\nSqueezeNet in Action: Maybe you’d like a Colab showing how to train a SqueezeNet model on a flower image dataset. This would demonstrate its small size and how it learns to recognize patterns despite its efficiency.\n\nMobileNet Exploration: Ever wonder if those tiny image models are just as good as the big ones? Let’s find out! In this Colab, we’re pitting MobileNet, the lightweight champion, against a classic image classification model. We’ll race them for speed, measure their memory needs, and see who comes out on top for accuracy. Get ready for a battle of the image brains!", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -955,9 +955,9 @@ "href": "contents/optimizations/optimizations.html#sec-model_ops_numerics", "title": "9  Model Optimizations", "section": "9.3 Efficient Numerics Representation", - "text": "9.3 Efficient Numerics Representation\nNumerics representation involves a myriad of considerations, including, but not limited to, the precision of numbers, their encoding formats, and the arithmetic operations facilitated. It invariably involves a rich array of different trade-offs, where practitioners are tasked with navigating between numerical accuracy and computational efficiency. For instance, while lower-precision numerics may offer the allure of reduced memory usage and expedited computations, they concurrently present challenges pertaining to numerical stability and potential degradation of model accuracy.\n\nMotivation\nThe imperative for efficient numerics representation arises, particularly as efficient model optimization alone falls short when adapting models for deployment on low-powered edge devices operating under stringent constraints.\nBeyond minimizing memory demands, the tremendous potential of efficient numerics representation lies in, but is not limited to, these fundamental ways. By diminishing computational intensity, efficient numerics can thereby amplify computational speed, allowing more complex models to compute on low-powered devices. Reducing the bit precision of weights and activations on heavily over-parameterized models enables condensation of model size for edge devices without significantly harming the model’s predictive accuracy. With the omnipresence of neural networks in models, efficient numerics has a unique advantage in leveraging the layered structure of NNs to vary numeric precision across layers, minimizing precision in resistant layers while preserving higher precision in sensitive layers.\nIn this section, we will dive into how practitioners can harness the principles of hardware-software co-design at the lowest levels of a model to facilitate compatibility with edge devices. Kicking off with an introduction to the numerics, we will examine its implications for device memory and computational complexity. Subsequently, we will embark on a discussion regarding the trade-offs entailed in adopting this strategy, followed by a deep dive into a paramount method of efficient numerics: quantization.\n\n\n9.3.1 The Basics\n\nTypes\nNumerical data, the bedrock upon which machine learning models stand, manifest in two primary forms. These are integers and floating point numbers.\nIntegers: Whole numbers, devoid of fractional components, integers (e.g., -3, 0, 42) are key in scenarios demanding discrete values. For instance, in ML, class labels in a classification task might be represented as integers, where “cat”, “dog”, and “bird” could be encoded as 0, 1, and 2 respectively.\nFloating-Point Numbers: Encompassing real numbers, floating-point numbers (e.g., -3.14, 0.01, 2.71828) afford the representation of values with fractional components. In ML model parameters, weights might be initialized with small floating-point values, such as 0.001 or -0.045, to commence the training process. Currently, there are 4 popular precision formats discussed below.\nVariable bit widths: Beyond the standard widths, research is ongoing into extremely low bit-width numerics, even down to binary or ternary representations. Extremely low bit-width operations can offer significant speedups and reduce power consumption even further. While challenges remain in maintaining model accuracy with such drastic quantization, advances continue to be made in this area.\n\n\nPrecision\nPrecision, delineating the exactness with which a number is represented, bifurcates typically into single, double, half and in recent years there have been a number of other precisions that have emerged to better support machine learning tasks efficiently on the underlying hardware.\nDouble Precision (Float64): Allocating 64 bits, double precision (e.g., 3.141592653589793) provides heightened accuracy, albeit demanding augmented memory and computational resources. In scientific computations, where precision is paramount, variables like π might be represented with Float64.\nSingle Precision (Float32): With 32 bits at its disposal, single precision (e.g., 3.1415927) strikes a balance between numerical accuracy and memory conservation. In ML, Float32 might be employed to store weights during training to maintain a reasonable level of precision.\nHalf Precision (Float16): Constrained to 16 bits, half precision (e.g., 3.14) curtails memory usage and can expedite computations, albeit sacrificing numerical accuracy and range. In ML, especially during inference on resource-constrained devices, Float16 might be utilized to reduce the model’s memory footprint.\nBfloat16: Brain Floating-Point Format or Bfloat16, also employs 16 bits but allocates them differently compared to FP16: 1 bit for the sign, 8 bits for the exponent (resulting in the same number range as in float32), and 7 bits for the fraction. This format, developed by Google, prioritizes a larger exponent range over precision, making it particularly useful in deep learning applications where the dynamic range is crucial.\nFigure 9.13 illustrates the differences between the three floating-point formats: Float32, Float16, and BFloat16.\n\n\n\n\n\n\nFigure 9.13: Three floating-point formats.\n\n\n\nInteger: Integer representations are made using 8, 4, and 2 bits. They are often used during the inference phase of neural networks, where the weights and activations of the model are quantized to these lower precisions. Integer representations are deterministic and offer significant speed and memory advantages over floating-point representations. For many inference tasks, especially on edge devices, the slight loss in accuracy due to quantization is often acceptable given the efficiency gains. An extreme form of integer numerics is for binary neural networks (BNNs), where weights and activations are constrained to one of two values: either +1 or -1.\n\n\n\n\n\n\n\n\nPrecision\nPros\nCons\n\n\n\n\nFP32 (Floating Point 32-bit)\nStandard precision used in most deep learning frameworks. High accuracy due to ample representational capacity. Well-suited for training.\nHigh memory usage. Slower inference times compared to quantized models. Higher energy consumption.\n\n\nFP16 (Floating Point 16-bit)\nReduces memory usage compared to FP32. Speeds up computations on hardware that supports FP16. Often used in mixed-precision training to balance speed and accuracy.\nLower representational capacity compared to FP32. Risk of numerical instability in some models or layers.\n\n\nINT8 (8-bit Integer)\nSignificantly reduced memory footprint compared to floating-point representations. Faster inference if hardware supports INT8 computations. Suitable for many post-training quantization scenarios.\nQuantization can lead to some accuracy loss. Requires careful calibration during quantization to minimize accuracy degradation.\n\n\nINT4 (4-bit Integer)\nEven lower memory usage than INT8. Further speed-up potential for inference.\nHigher risk of accuracy loss compared to INT8. Calibration during quantization becomes more critical.\n\n\nBinary\nMinimal memory footprint (only 1 bit per parameter). Extremely fast inference due to bitwise operations. Power efficient.\nSignificant accuracy drop for many tasks. Complex training dynamics due to extreme quantization.\n\n\nTernary\nLow memory usage but slightly more than binary. Offers a middle ground between representation and efficiency.\nAccuracy might still be lower than higher precision models. Training dynamics can be complex.\n\n\n\n\n\nNumeric Encoding and Storage\nNumeric encoding, the art of transmuting numbers into a computer-amenable format, and their subsequent storage are critical for computational efficiency. For instance, floating-point numbers might be encoded using the IEEE 754 standard, which apportions bits among sign, exponent, and fraction components, thereby enabling the representation of a vast array of values with a single format. There are a few new IEEE floating point formats that have been defined specifically for AI workloads:\n\nbfloat16- A 16-bit floating point format introduced by Google. It has 8 bits for exponent, 7 bits for mantissa and 1 bit for sign. Offers a reduced precision compromise between 32-bit float and 8-bit integers. Supported on many hardware accelerators.\nposit - A configurable format that can represent different levels of precision based on exponent bits. Aims to be more efficient than IEEE 754 binary floats. Has adjustable dynamic range and precision.\nFlexpoint - A format introduced by Intel that can dynamically adjust precision across layers or within a layer. Allows tuning precision to accuracy and hardware requirements.\nBF16ALT - A proposed 16-bit format by ARM as an alternative to bfloat16. Uses additional bit in exponent to prevent overflow/underflow.\nTF32 - Introduced by Nvidia for Ampere GPUs. Uses 10 bits for exponent instead of 8 bits like FP32. Improves model training performance while maintaining accuracy.\nFP8 - 8-bit floating point format that keeps 6 bits for mantissa and 2 bits for exponent. Enables better dynamic range than integers.\n\nThe key goals of these new formats are to provide lower precision alternatives to 32-bit floats for better computational efficiency and performance on AI accelerators while maintaining model accuracy. They offer different tradeoffs in terms of precision, range and implementation cost/complexity.\n\n\n\n9.3.2 Efficiency Benefits\nNumerical efficiency matters for machine learning workloads for a number of reasons:\nComputational Efficiency: High-precision computations (like FP32 or FP64) can be slow and resource-intensive. By reducing numeric precision, one can achieve faster computation times, especially on specialized hardware that supports lower precision.\nMemory Efficiency: Storage requirements decrease with reduced numeric precision. For instance, FP16 requires half the memory of FP32. This is crucial when deploying models to edge devices with limited memory or when working with very large models.\nPower Efficiency: Lower precision computations often consume less power, which is especially important for battery-operated devices.\nNoise Introduction: Interestingly, the noise introduced by using lower precision can sometimes act as a regularizer, helping to prevent overfitting in some models.\nHardware Acceleration: Many modern AI accelerators and GPUs are optimized for lower precision operations, leveraging the efficiency benefits of such numerics.\nEfficient numerics is not just about reducing the bit-width of numbers but understanding the trade-offs between accuracy and efficiency. As machine learning models become more pervasive, especially in real-world, resource-constrained environments, the focus on efficient numerics will continue to grow. By thoughtfully selecting and leveraging the appropriate numeric precision, one can achieve robust model performance while optimizing for speed, memory, and energy.\n\n\n9.3.3 Numeric Representation Nuances\nThere are a number of nuances with numerical representations for ML that require us to have an understanding of both the theoretical and practical aspects of numerics representation, as well as a keen awareness of the specific requirements and constraints of the application domain.\n\nMemory Usage\nThe memory footprint of ML models, particularly those of considerable complexity and depth, can be substantial, thereby posing a significant challenge in both training and deployment phases. For instance, a deep neural network with 100 million parameters, represented using Float32 (32 bits or 4 bytes per parameter), would necessitate approximately 400 MB of memory just for storing the model weights. This does not account for additional memory requirements during training for storing gradients, optimizer states, and forward pass caches, which can further amplify the memory usage, potentially straining the resources on certain hardware, especially edge devices with limited memory capacity.\n\n\nImpact on Model Parameters and Weights\nThe numeric representation casts a significant impact on the storage and computational requisites of ML model parameters and weights. For instance, a model utilizing Float64 for weights will demand double the memory and potentially increased computational time compared to a counterpart employing Float32. A weight matrix, for instance, with dimensions [1000, 1000] using Float64 would consume approximately 8MB of memory, whereas using Float32 would halve this to approximately 4MB.\n\n\nComputational Complexity\nNumerical precision directly impacts computational complexity, influencing the time and resources required to perform arithmetic operations. For example, operations using Float64 generally consume more computational resources than their Float32 or Float16 counterparts (see Figure 9.14). In the realm of ML, where models might need to process millions of operations (e.g., multiplications and additions in matrix operations during forward and backward passes), even minor differences in the computational complexity per operation can aggregate into a substantial impact on training and inference times. As shown in Figure 9.15, quantized models can be many times faster than their unquantized versions.\n\n\n\n\n\n\nFigure 9.14: Energy use by quantized operations. Credit: Mark Horowitz, Stanford University.\n\n\n\n\n\n\n\n\n\nFigure 9.15: Speed of three different models in normal and quantized form.\n\n\n\nIn addition to pure runtimes, there is also a concern over energy efficiency. Not all numerical computations are created equal from the underlying hardware standpoint. Some numerical operations are more energy efficient than others. For example, Figure 9.16 below shows that integer addition is much more energy efficient than integer multiplication.\n\n\n\n\n\n\nFigure 9.16: Energy use by quantized operations. Credit: Isscc (2014).\n\n\nIsscc. 2014. “Computing’s Energy Problem (and What We Can Do about It).” https://ieeexplore.ieee.org/document/6757323.\n\n\n\n\nHardware Compatibility\nEnsuring compatibility and optimized performance across diverse hardware platforms is another challenge in numerics representation. Different hardware, such as CPUs, GPUs, TPUs, and FPGAs, have varying capabilities and optimizations for handling different numeric precisions. For example, certain GPUs might be optimized for Float32 computations, while others might provide accelerations for Float16. Developing and optimizing ML models that can leverage the specific numerical capabilities of different hardware, while ensuring that the model maintains its accuracy and robustness, requires careful consideration and potentially additional development and testing efforts.\n\n\nPrecision and Accuracy Trade-offs\nThe trade-off between numerical precision and model accuracy is a nuanced challenge in numerics representation. Utilizing lower-precision numerics, such as Float16, might conserve memory and expedite computations but can also introduce issues like quantization error and reduced numerical range. For instance, training a model with Float16 might introduce challenges in representing very small gradient values, potentially impacting the convergence and stability of the training process. Furthermore, in certain applications, such as scientific simulations or financial computations, where high precision is paramount, the use of lower-precision numerics might not be permissible due to the risk of accruing significant errors.\n\n\nTrade-off Examples\nTo understand and appreciate the nuances let’s consider some use case examples. Through these we will realize that the choice of numeric representation is not merely a technical decision but a strategic one, influencing the model’s predictive acumen, its computational demands, and its deployability across diverse computational environments. In this section we will look at a couple of examples to better understand the trade-offs with numerics and how they tie to the real world.\n\nAutonomous Vehicles\nIn the domain of autonomous vehicles, ML models are employed to interpret sensor data and make real-time decisions. The models must process high-dimensional data from various sensors (e.g., LiDAR, cameras, radar) and execute numerous computations within a constrained time frame to ensure safe and responsive vehicle operation. So the trade-offs here would include:\n\nMemory Usage: Storing and processing high-resolution sensor data, especially in floating-point formats, can consume substantial memory.\nComputational Complexity: Real-time processing demands efficient computations, where higher-precision numerics might impede the timely execution of control actions.\n\n\n\nMobile Health Applications\nMobile health applications often utilize ML models for tasks like activity recognition, health monitoring, or predictive analytics, operating within the resource-constrained environment of mobile devices. The trade-offs here would include:\n\nPrecision and Accuracy Trade-offs: Employing lower-precision numerics to conserve resources might impact the accuracy of health predictions or anomaly detections, which could have significant implications for user health and safety.\nHardware Compatibility: Models need to be optimized for diverse mobile hardware, ensuring efficient operation across a wide range of devices with varying numerical computation capabilities.\n\n\n\nHigh-Frequency Trading (HFT) Systems\nHFT systems leverage ML models to make rapid trading decisions based on real-time market data. These systems demand extremely low-latency responses to capitalize on short-lived trading opportunities.\n\nComputational Complexity: The models must process and analyze vast streams of market data with minimal latency, where even slight delays, potentially introduced by higher-precision numerics, can result in missed opportunities.\nPrecision and Accuracy Trade-offs: Financial computations often demand high numerical precision to ensure accurate pricing and risk assessments, posing challenges in balancing computational efficiency and numerical accuracy.\n\n\n\nEdge-Based Surveillance Systems\nSurveillance systems deployed on edge devices, like security cameras, utilize ML models for tasks like object detection, activity recognition, and anomaly detection, often operating under stringent resource constraints.\n\nMemory Usage: Storing pre-trained models and processing video feeds in real-time demands efficient memory usage, which can be challenging with high-precision numerics.\nHardware Compatibility: Ensuring that models can operate efficiently on edge devices with varying hardware capabilities and optimizations for different numeric precisions is crucial for widespread deployment.\n\n\n\nScientific Simulations\nML models are increasingly being utilized in scientific simulations, such as climate modeling or molecular dynamics simulations, to enhance predictive capabilities and reduce computational demands.\n\nPrecision and Accuracy Trade-offs: Scientific simulations often require high numerical precision to ensure accurate and reliable results, which can conflict with the desire to reduce computational demands via lower-precision numerics.\nComputational Complexity: The models must manage and process complex, high-dimensional simulation data efficiently to ensure timely results and enable large-scale or long-duration simulations.\n\nThese examples illustrate diverse scenarios where the challenges of numerics representation in ML models are prominently manifested. Each system presents a unique set of requirements and constraints, necessitating tailored strategies and solutions to navigate the challenges of memory usage, computational complexity, precision-accuracy trade-offs, and hardware compatibility.\n\n\n\n\n9.3.4 Quantization\nQuantization is prevalent in various scientific and technological domains, and it essentially involves the mapping or constraining of a continuous set or range into a discrete counterpart to minimize the number of bits required.\n\nHistory\nHistorically, the idea of quantization is not novel and can be traced back to ancient times, particularly in the realm of music and astronomy. In music, the Greeks utilized a system of tetrachords, segmenting the continuous range of pitches into discrete notes, thereby quantizing musical sounds. In astronomy and physics, the concept of quantization was present in the discretized models of planetary orbits, as seen in the Ptolemaic and Copernican systems.\nDuring the 1800s, quantization-based discretization was used to approximate the calculation of integrals, and further used to investigate the impact of rounding errors on the integration result. With algorithms, Lloyd’s K-Means Algorithm is a classic example of quantization. However, the term “quantization” was firmly embedded in scientific literature with the advent of quantum mechanics in the early 20th century, where it was used to describe the phenomenon that certain physical properties, such as energy, exist only in discrete, quantized states. This principle was pivotal in explaining phenomena at the atomic and subatomic levels. In the digital age, quantization found its application in signal processing, where continuous signals are converted into a discrete digital form, and in numerical algorithms, where computations on real-valued numbers are performed with finite-precision arithmetic.\nExtending upon this second application and relevant to this section, it is used in computer science to optimize neural networks by reducing the precision of the network weights. Thus, quantization, as a concept, has been subtly woven into the tapestry of scientific and technological development, evolving and adapting to the needs and discoveries of various epochs.\n\n\nInitial Breakdown\nWe begin our foray into quantization with a brief analysis of one important use for quantization.\nIn signal processing, the continuous sine wave (shown in Figure 9.17) can be quantized into discrete values through a process known as sampling. This is a fundamental concept in digital signal processing and is crucial for converting analog signals (like the continuous sine wave) into a digital form that can be processed by computers. The sine wave is a prevalent example due to its periodic and smooth nature, making it a useful tool for explaining concepts like frequency, amplitude, phase, and, of course, quantization.\n\n\n\n\n\n\nFigure 9.17: Sine Wave.\n\n\n\nIn the quantized version shown in Figure 9.18, the continuous sine wave (Figure 9.17) is sampled at regular intervals (in this case, every \\(\\frac{\\pi}{4}\\) radians), and only these sampled values are represented in the digital version of the signal. The step-wise lines between the points show one way to represent the quantized signal in a piecewise-constant form. This is a simplified example of how analog-to-digital conversion works, where a continuous signal is mapped to a discrete set of values, enabling it to be represented and processed digitally.\n\n\n\n\n\n\nFigure 9.18: Quantized Sine Wave.\n\n\n\nReturning to the context of Machine Learning (ML), quantization refers to the process of constraining the possible values that numerical parameters (such as weights and biases) can take to a discrete set, thereby reducing the precision of the parameters and consequently, the model’s memory footprint. When properly implemented, quantization can reduce model size by up to 4x and improve inference latency and throughput by up to 2-3x. Figure 9.19 illustrates the impact that quantization has on different models’ sizes: for example, an Image Classification model like ResNet-v2 can be compressed from 180MB down to 45MB with 8-bit quantization. There is typically less than 1% loss in model accuracy from well tuned quantization. Accuracy can often be recovered by re-training the quantized model with quantization-aware training techniques. Therefore, this technique has emerged to be very important in deploying ML models to resource-constrained environments, such as mobile devices, IoT devices, and edge computing platforms, where computational resources (memory and processing power) are limited.\n\n\n\n\n\n\nFigure 9.19: Effect of quantization on model sizes. Credit: HarvardX.\n\n\n\nThere are several dimensions to quantization such as uniformity, stochasticity (or determinism), symmetry, granularity (across layers/channels/groups or even within channels), range calibration considerations (static vs dynamic), and fine-tuning methods (QAT, PTQ, ZSQ). We examine these below.\n\n\n\n9.3.5 Types\n\nUniform Quantization\nUniform quantization involves mapping continuous or high-precision values to a lower-precision representation using a uniform scale. This means that the interval between each possible quantized value is consistent. For example, if weights of a neural network layer are quantized to 8-bit integers (values between 0 and 255), a weight with a floating-point value of 0.56 might be mapped to an integer value of 143, assuming a linear mapping between the original and quantized scales. Due to its use of integer or fixed-point math pipelines, this form of quantization allows computation on the quantized domain without the need to dequantize beforehand.\nThe process for implementing uniform quantization starts with choosing a range of real numbers to be quantized. The next step is to select a quantization function and map the real values to the integers representable by the bit-width of the quantized representation. For instance, a popular choice for a quantization function is:\n\\[\nQ(r)=Int(r/S) - Z\n\\]\nwhere Q is the quantization operator, r is a real valued input (in our case, an activation or weight), S is a real valued scaling factor, and Z is an integer zero point. The Int function maps a real value to an integer value through a rounding operation. Through this function, we have effectively mapped real values r to some integer values, resulting in quantized levels which are uniformly spaced.\nWhen the need arises for practitioners to retrieve the original higher precision values, real values r can be recovered from quantized values through an operation known as dequantization. In the example above, this would mean performing the following operation on our quantized value:\n\\[\n\\bar{r} = S(Q(r) + Z)\n\\]\nAs discussed, some precision in the real value is lost by quantization. In this case, the recovered value \\(\\bar{r}\\) will not exactly match r due to the rounding operation. This is an important tradeoff to note; however, in many successful uses of quantization, the loss of precision can be negligible and the test accuracy remains high. Despite this, uniform quantization continues to be the current de-facto choice due to its simplicity and efficient mapping to hardware.\n\n\nNon-uniform Quantization\nNon-uniform quantization, on the other hand, does not maintain a consistent interval between quantized values. This approach might be used to allocate more possible discrete values in regions where the parameter values are more densely populated, thereby preserving more detail where it is most needed. For instance, in bell-shaped distributions of weights with long tails, a set of weights in a model predominantly lies within a certain range; thus, more quantization levels might be allocated to that range to preserve finer details, enabling us to better capture information. However, one major weakness of non-uniform quantization is that it requires dequantization before higher precision computations due to its non-uniformity, restricting its ability to accelerate computation compared to uniform quantization.\nTypically, a rule-based non-uniform quantization uses a logarithmic distribution of exponentially increasing steps and levels as opposed to linearly. Another popular branch lies in binary-code-based quantization where real number vectors are quantized into binary vectors with a scaling factor. Notably, there is no closed form solution for minimizing errors between the real value and non-uniformly quantized value, so most quantizations in this field rely on heuristic solutions. For instance, recent work by Xu et al. (2018) formulates non-uniform quantization as an optimization problem where the quantization steps/levels in quantizer Q are adjusted to minimize the difference between the original tensor and quantized counterpart.\n\nXu, Chen, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, and Hongbin Zha. 2018. “Alternating Multi-Bit Quantization for Recurrent Neural Networks.” In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=S19dR9x0b.\n\\[\n\\min_Q ||Q(r)-r||^2\n\\]\nFurthermore, learnable quantizers can be jointly trained with model parameters, and the quantization steps/levels are generally trained with iterative optimization or gradient descent. Additionally, clustering has been used to alleviate information loss from quantization. While capable of capturing higher levels of detail, non-uniform quantization schemes can be difficult to deploy efficiently on general computation hardware, making it less-preferred to methods which use uniform quantization.\n\n\n\n\n\n\nFigure 9.20: Quantization uniformity. Credit: Gholami et al. (2021).\n\n\n\n\n\nStochastic Quantization\nUnlike the two previous approaches which generate deterministic mappings, there is some work exploring the idea of stochastic quantization for quantization-aware training and reduced precision training. This approach maps floating numbers up or down with a probability associated to the magnitude of the weight update. The hope generated by high level intuition is that such a probabilistic approach may allow a neural network to explore more, as compared to deterministic quantization. Supposedly, enabling a stochastic rounding may allow neural networks to escape local optimums, thereby updating its parameters. Below are two example stochastic mapping functions:\n\n\n\n\n\n\n\nFigure 9.21: Integer vs Binary quantization functions.\n\n\n\n\n\nZero Shot Quantization\nZero-shot quantization refers to the process of converting a full-precision deep learning model directly into a low-precision, quantized model without the need for any retraining or fine-tuning on the quantized model. The primary advantage of this approach is its efficiency, as it eliminates the often time-consuming and resource-intensive process of retraining a model post-quantization. By leveraging techniques that anticipate and minimize quantization errors, zero-shot quantization aims to maintain the model’s original accuracy even after reducing its numerical precision. It is particularly useful for Machine Learning as a Service (MLaaS) providers aiming to expedite the deployment of their customer’s workloads without having to access their datasets.\n\n\n\n9.3.6 Calibration\nCalibration is the process of selecting the most effective clipping range [\\(\\alpha\\), \\(\\beta\\)] for weights and activations to be quantized to. For example, consider quantizing activations that originally have a floating-point range between -6 and 6 to 8-bit integers. If you just take the minimum and maximum possible 8-bit integer values (-128 to 127) as your quantization range, it might not be the most effective. Instead, calibration would involve passing a representative dataset then use this observed range for quantization.\nThere are many calibration methods but a few commonly used include:\n\nMax: Use the maximum absolute value seen during calibration. However, this method is susceptible to outlier data. Notice how in Figure 9.22, we have an outlier cluster around 2.1, while the rest are clustered around smaller values.\nEntropy: Use KL divergence to minimize information loss between the original floating-point values and values that could be represented by the quantized format. This is the default method used by TensorRT.\nPercentile: Set the range to a percentile of the distribution of absolute values seen during calibration. For example, 99% calibration would clip 1% of the largest magnitude values.\n\n\n\n\n\n\n\nFigure 9.22: Input activations to layer 3 in ResNet50. Credit: @Wu, Judd, and Isaev (2020).\n\n\n\nImportantly, the quality of calibration can make a difference between a quantized model that retains most of its accuracy and one that degrades significantly. Hence, it’s an essential step in the quantization process. When choosing a calibration range, there are two types: symmetric and asymmetric.\n\nSymmetric Quantization\nSymmetric quantization maps real values to a symmetrical clipping range centered around 0. This involves choosing a range [\\(\\alpha\\), \\(\\beta\\)] where \\(\\alpha = -\\beta\\). For example, one symmetrical range would be based on the min/max values of the real values such that: -\\(\\alpha = \\beta = max(abs(r_{max}), abs(r_{min}))\\).\nSymmetric clipping ranges are the most widely adopted in practice as they have the advantage of easier implementation. In particular, the mapping of zero to zero in the clipping range (sometimes called “zeroing out of the zero point”) can lead to reduction in computational cost during inference (Wu, Judd, and Isaev (2020)).\n\n\nAsymmetric Quantization\nAsymmetric quantization maps real values to an asymmetrical clipping range that isn’t necessarily centered around 0, as shown in Figure 9.23 on the right. It involves choosing a range [\\(\\alpha\\), \\(\\beta\\)] where \\(\\alpha \\neq -\\beta\\). For example, selecting a range based on the minimum and maximum real values, or where \\(\\alpha = r_{min}\\) and \\(\\beta = r_{max}\\), creates an asymmetric range. Typically, asymmetric quantization produces tighter clipping ranges compared to symmetric quantization, which is important when target weights and activations are imbalanced, e.g., the activation after the ReLU always has non-negative values. Despite producing tighter clipping ranges, asymmetric quantization is less preferred to symmetric quantization as it doesn’t always zero out the real value zero.\n\n\n\n\n\n\nFigure 9.23: Quantization (a)symmetry. Credit: Gholami et al. (2021).\n\n\n\n\n\nGranularity\nUpon deciding the type of clipping range, it is essential to tighten the range to allow a model to retain as much of its accuracy as possible. We’ll be taking a look at convolutional neural networks as our way of exploring methods that fine tune the granularity of clipping ranges for quantization. The input activation of a layer in our CNN undergoes convolution with multiple convolutional filters. Every convolutional filter can possess a unique range of values. Notice how in Figure 9.24, the range for Filter1 is much smaller than that for Filter 3. Consequently, one distinguishing feature of quantization approaches is the precision with which the clipping range [α,β] is determined for the weights.\n\n\n\n\n\n\nFigure 9.24: Quantization granularity: variable ranges. Credit: Gholami et al. (2021).\n\n\n\n\nLayerwise Quantization: This approach determines the clipping range by considering all of the weights in the convolutional filters of a layer. Then, the same clipping range is used for all convolutional filters. It’s the simplest to implement, and, as such, it often results in sub-optimal accuracy due the wide variety of differing ranges between filters. For example, a convolutional kernel with a narrower range of parameters loses its quantization resolution due to another kernel in the same layer having a wider range.\nGroupwise Quantization: This approach groups different channels inside a layer to calculate the clipping range. This method can be helpful when the distribution of parameters across a single convolution/activation varies a lot. In practice, this method was useful in Q-BERT (Shen et al. 2020) for quantizing Transformer (Vaswani et al. 2017) models that consist of fully-connected attention layers. The downside with this approach comes with the extra cost of accounting for different scaling factors.\nChannelwise Quantization: This popular method uses a fixed range for each convolutional filter that is independent of other channels. Because each channel is assigned a dedicated scaling factor, this method ensures a higher quantization resolution and often results in higher accuracy.\nSub-channelwise Quantization: Taking channelwise quantization to the extreme, this method determines the clipping range with respect to any groups of parameters in a convolution or fully-connected layer. It may result in considerable overhead since different scaling factors need to be taken into account when processing a single convolution or fully-connected layer.\n\n\nShen, Sheng, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. “Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (05): 8815–21. https://doi.org/10.1609/aaai.v34i05.6409.\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.\nOf these, channelwise quantization is the current standard used for quantizing convolutional kernels, since it enables the adjustment of clipping ranges for each individual kernel with negligible overhead.\n\n\nStatic and Dynamic Quantization\nAfter determining the type and granularity of the clipping range, practitioners must decide when ranges are determined in their range calibration algorithms. There are two approaches to quantizing activations: static quantization and dynamic quantization.\nStatic quantization is the most frequently used approach. In this, the clipping range is pre-calculated and static during inference. It does not add any computational overhead, but, consequently, results in lower accuracy as compared to dynamic quantization. A popular method of implementing this is to run a series of calibration inputs to compute the typical range of activations [Quantization and training of neural networks for efficient integer-arithmetic-only inference, Dyadic neural network quantization].\nDynamic quantization is an alternative approach which dynamically calculates the range for each activation map during runtime. The approach requires real-time computations which might have a very high overhead. By doing this, dynamic quantization often achieves the highest accuracy as the range is calculated specifically for each input.\nBetween the two, calculating the range dynamically usually is very costly, so most practitioners will often use static quantization instead.\n\n\n\n9.3.7 Techniques\nThe two prevailing techniques for quantizing models are Post Training Quantization and Quantization-Aware Training.\nPost Training Quantization - Post-training quantization (PTQ) is a quantization technique where the model is quantized after it has been trained. The model is trained in floating point and then weights and activations are quantized as a post-processing step. This is the simplest approach and does not require access to the training data. Unlike Quantization-Aware Training (QAT), PTQ sets weight and activation quantization parameters directly, making it low-overhead and suitable for limited or unlabeled data situations. However, not readjusting the weights after quantizing, especially in low-precision quantization can lead to very different behavior and thus lower accuracy. To tackle this, techniques like bias correction, equalizing weight ranges, and adaptive rounding methods have been developed. PTQ can also be applied in zero-shot scenarios, where no training or testing data are available. This method has been made even more efficient to benefit compute- and memory- intensive large language models. Recently, SmoothQuant, a training-free, accuracy-preserving, and general-purpose PTQ solution which enables 8-bit weight, 8-bit activation quantization for LLMs, has been developed, demonstrating up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy (Xiao et al. (2022)).\nIn PTQ, a pretrained model undergoes a calibration process, as shown in Figure 9.25. Calibration involves using a separate dataset known as calibration data, a specific subset of the training data reserved for quantization to help find the appropriate clipping ranges and scaling factors.\n\n\n\n\n\n\nFigure 9.25: Post-Training Quantization and calibration. Credit: Gholami et al. (2021).\n\n\n\nQuantization-Aware Training - Quantization-aware training (QAT) is a fine-tuning of the PTQ model. The model is trained aware of quantization, allowing it to adjust for quantization effects. This produces better accuracy with quantized inference. Quantizing a trained neural network model with methods such as PTQ introduces perturbations that can deviate the model from its original convergence point. For instance, Krishnamoorthi showed that even with per-channel quantization, networks like MobileNet do not reach baseline accuracy with int8 Post Training Quantization (PTQ) and require Quantization-Aware Training (QAT) (Krishnamoorthi (2018)).To address this, QAT retrains the model with quantized parameters, employing forward and backward passes in floating point but quantizing parameters after each gradient update. Handling the non-differentiable quantization operator is crucial; a widely used method is the Straight Through Estimator (STE), approximating the rounding operation as an identity function. While other methods and variations exist, STE remains the most commonly used due to its practical effectiveness. In QAT, a pretrained model is quantized and then finetuned using training data to adjust parameters and recover accuracy degradation, as shown in Figure 9.26. The calibration process is often conducted in parallel with the finetuning process for QAT.\n\n\n\n\n\n\nFigure 9.26: Quantization-Aware Training. Credit: Gholami et al. (2021).\n\n\nGholami, Dong Kim, Mahoney Yao, and Keutzer. 2021. “A Survey of Quantization Methods for Efficient Neural Network Inference).” ArXiv Preprint. https://arxiv.org/abs/2103.13630.\n\n\nQuantization-Aware Training serves as a natural extension of Post-Training Quantization. Following the initial quantization performed by PTQ, QAT is used to further refine and fine-tune the quantized parameters - see how in Figure 9.27, the PTQ model undergoes an additional step, QAT. It involves a retraining process where the model is exposed to additional training iterations using the original data. This dynamic training approach allows the model to adapt and adjust its parameters, compensating for the performance degradation caused by quantization.\n\n\n\n\n\n\nFigure 9.27: PTQ and QAT. Credit: “The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training” (n.d.).\n\n\n“The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training.” n.d. https://deci.ai/quantization-and-quantization-aware-training/.\n\n\nFigure 9.28 shows the relative accuracy of different models after PTQ and QAT. In almost all cases, QAT yields a better accuracy than PTQ. Consider for example EfficientNet b0. After PTQ, the accuracy drops from 76.85% to 72.06%. But when we apply QAT, the accuracy rebounds to 76.95% (with even a slight improvement over the original accuracy).\n\n\n\n\n\n\nFigure 9.28: Relative accuracies of PTQ and QAT. Credit: Wu, Judd, and Isaev (2020).\n\n\n\n\n\n\n\n\n\n\n\n\nFeature/Technique\nPost Training Quantization\nQuantization-Aware Training\nDynamic Quantization\n\n\n\n\nPros\n\n\n\n\n\nSimplicity\n✓\n✗\n✗\n\n\nAccuracy Preservation\n✗\n✓\n✓\n\n\nAdaptability\n✗\n✗\n✓\n\n\nOptimized Performance\n✗\n✓\nPotentially\n\n\nCons\n\n\n\n\n\nAccuracy Degradation\n✓\n✗\nPotentially\n\n\nComputational Overhead\n✗\n✓\n✓\n\n\nImplementation Complexity\n✗\n✓\n✓\n\n\nTradeoffs\n\n\n\n\n\nSpeed vs. Accuracy\n✓\n✗\n✗\n\n\nAccuracy vs. Cost\n✗\n✓\n✗\n\n\nAdaptability vs. Overhead\n✗\n✗\n✓\n\n\n\n\n\n9.3.8 Weights vs. Activations\nWeight Quantization: Involves converting the continuous or high-precision weights of a model to lower-precision, such as converting Float32 weights to quantized INT8 (integer) weights - in Figure 9.29, weight quantization is taking place in the second step (red squares) when we multiply the inputs. This reduces the model size, thereby reducing the memory required to store the model and the computational resources needed to perform inference. For example, consider a weight matrix in a neural network layer with Float32 weights as [0.215, -1.432, 0.902, …]. Through weight quantization, these might be mapped to INT8 values like [27, -183, 115, …], significantly reducing the memory required to store them.\n\n\n\n\n\n\nFigure 9.29: Weight and activation quantization. Credit: HarvardX.\n\n\n\nActivation Quantization: Involves quantizing the activation values (outputs of layers) during model inference. This can reduce the computational resources required during inference, but it introduces additional challenges in maintaining model accuracy due to the reduced precision of intermediate computations. For example, in a convolutional neural network (CNN), the activation maps (feature maps) produced by convolutional layers, originally in Float32, might be quantized to INT8 during inference to accelerate computation, especially on hardware optimized for integer arithmetic. Additionally, recent work has explored the use of Activation-aware Weight Quantization for LLM compression and acceleration, which involves protecting only 1% of the most important salient weights by observing the activations not weights (Lin et al. (2023)).\n\n\n9.3.9 Trade-offs\nQuantization invariably introduces a trade-off between model size/performance and accuracy. While it significantly reduces the memory footprint and can accelerate inference, especially on hardware optimized for low-precision arithmetic, the reduced precision can degrade model accuracy.\nModel Size: A model with weights represented as Float32 being quantized to INT8 can theoretically reduce the model size by a factor of 4, enabling it to be deployed on devices with limited memory. The model size of large language models is developing at a faster pace than the GPU memory in recent years, leading to a big gap between the supply and demand for memory. Figure 9.30 illustrates the recent trend of the widening gap between model size (red line) and acceleartor memory (yellow line). Quantization and model compression techniques can help bridge the gap\n\n\n\n\n\n\nFigure 9.30: Model size vs. accelerator memory. Credit: Xiao et al. (2022).\n\n\nXiao, Seznec Lin, Demouth Wu, and Han. 2022. “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.” ArXiv Preprint. https://arxiv.org/abs/2211.10438.\n\n\nInference Speed: Quantization can also accelerate inference, as lower-precision arithmetic is computationally less expensive. For example, certain hardware accelerators, like Google’s Edge TPU, are optimized for INT8 arithmetic and can perform inference significantly faster with INT8 quantized models compared to their floating-point counterparts. The reduction in memory from quantization helps reduce the amount of data transmission, saving up memory and speeding the process. Figure 9.31 compares the increase in throughput and the reduction in bandwidth memory for different data type on the NVIDIA Turing GPU.\n\n\n\n\n\n\nFigure 9.31: Benefits of lower precision data types. Credit: Wu, Judd, and Isaev (2020).\n\n\nWu, Zhang Judd, and Micikevicius Isaev. 2020. “Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation).” ArXiv Preprint. https://arxiv.org/abs/2004.09602.\n\n\nAccuracy: The reduction in numerical precision post-quantization can lead to a degradation in model accuracy, which might be acceptable in certain applications (e.g., image classification) but not in others (e.g., medical diagnosis). Therefore, post-quantization, the model typically requires re-calibration or fine-tuning to mitigate accuracy loss. Furthermore, recent work has explored the use of Activation-aware Weight Quantization (Lin et al. (2023)) which is based on the observation that protecting only 1% of salient weights can greatly reduce quantization error.\n\n\n9.3.10 Quantization and Pruning\nPruning and quantization work well together, and it’s been found that pruning doesn’t hinder quantization. In fact, pruning can help reduce quantization error. Intuitively, this is due to pruning reducing the number of weights to quantize, thereby reducing the accumulated error from quantization. For example, an unpruned AlexNet has 60 million weights to quantize whereas a pruned AlexNet only has 6.7 million weights to quantize. This significant drop in weights helps reduce the error between quantizing the unpruned AlexNet vs. the pruned AlexNet. Furthermore, recent work has found that quantization-aware pruning generates more computationally efficient models than either pruning or quantization alone; It typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization (Hawks et al. (2021)).\n\n\n\n\n\n\nFigure 9.32: Accuracy vs. compression rate under different compression methods. Credit: Han, Mao, and Dally (2015).\n\n\nHan, Song, Huizi Mao, and William J Dally. 2015. “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.” arXiv Preprint arXiv:1510.00149.\n\n\n\n\n9.3.11 Edge-aware Quantization\nQuantization not only reduces model size but also enables faster computations and draws less power, making it vital to edge development. Edge devices typically have tight resource constraints with compute, memory, and power, which are impossible to meet for many of the deep NN models of today. Furthermore, edge processors do not support floating point operations, making integer quantization particularly important for chips like GAP-8, a RISC-V SoC for edge inference with a dedicated CNN accelerator, which only support integer arithmetic..\nOne hardware platform utilizing quantization is the ARM Cortex-M group of 32-bit RISC ARM processor cores. They leverage fixed-point quantization with power of two scaling factors so that quantization and dequantization can be efficiently done by bit shifting. Additionally, Google Edge TPUs, Google’s emerging solution for running inference at the edge, is designed for small, low-powered devices and can only support 8-bit arithmetic. Many complex neural network models that could only be deployed on servers due to their high computational needs can now be run on edge devices thanks to recent advancements (e.g. quantization methods) in edge computing field.\nIn addition to being an indispensable technique for many edge processors, quantization has also brought noteworthy improvements to non-edge processors such as encouraging such processors to meet the Service Level Agreement (SLA) requirements such as 99th percentile latency.\nThus, quantization combined with efficient low-precision logic and dedicated deep learning accelerators, has been one crucial driving force for the evolution of such edge processors.\nThe video below is a lecture on quantization and the different quantization methods.", + "text": "9.3 Efficient Numerics Representation\nNumerics representation involves a myriad of considerations, including, but not limited to, the precision of numbers, their encoding formats, and the arithmetic operations facilitated. It invariably involves a rich array of different trade-offs, where practitioners are tasked with navigating between numerical accuracy and computational efficiency. For instance, while lower-precision numerics may offer the allure of reduced memory usage and expedited computations, they concurrently present challenges pertaining to numerical stability and potential degradation of model accuracy.\n\nMotivation\nThe imperative for efficient numerics representation arises, particularly as efficient model optimization alone falls short when adapting models for deployment on low-powered edge devices operating under stringent constraints.\nBeyond minimizing memory demands, the tremendous potential of efficient numerics representation lies in, but is not limited to, these fundamental ways. By diminishing computational intensity, efficient numerics can thereby amplify computational speed, allowing more complex models to compute on low-powered devices. Reducing the bit precision of weights and activations on heavily over-parameterized models enables condensation of model size for edge devices without significantly harming the model’s predictive accuracy. With the omnipresence of neural networks in models, efficient numerics has a unique advantage in leveraging the layered structure of NNs to vary numeric precision across layers, minimizing precision in resistant layers while preserving higher precision in sensitive layers.\nIn this section, we will dive into how practitioners can harness the principles of hardware-software co-design at the lowest levels of a model to facilitate compatibility with edge devices. Kicking off with an introduction to the numerics, we will examine its implications for device memory and computational complexity. Subsequently, we will embark on a discussion regarding the trade-offs entailed in adopting this strategy, followed by a deep dive into a paramount method of efficient numerics: quantization.\n\n\n9.3.1 The Basics\n\nTypes\nNumerical data, the bedrock upon which machine learning models stand, manifest in two primary forms. These are integers and floating point numbers.\nIntegers: Whole numbers, devoid of fractional components, integers (e.g., -3, 0, 42) are key in scenarios demanding discrete values. For instance, in ML, class labels in a classification task might be represented as integers, where “cat”, “dog”, and “bird” could be encoded as 0, 1, and 2 respectively.\nFloating-Point Numbers: Encompassing real numbers, floating-point numbers (e.g., -3.14, 0.01, 2.71828) afford the representation of values with fractional components. In ML model parameters, weights might be initialized with small floating-point values, such as 0.001 or -0.045, to commence the training process. Currently, there are 4 popular precision formats discussed below.\nVariable bit widths: Beyond the standard widths, research is ongoing into extremely low bit-width numerics, even down to binary or ternary representations. Extremely low bit-width operations can offer significant speedups and reduce power consumption even further. While challenges remain in maintaining model accuracy with such drastic quantization, advances continue to be made in this area.\n\n\nPrecision\nPrecision, delineating the exactness with which a number is represented, bifurcates typically into single, double, half and in recent years there have been a number of other precisions that have emerged to better support machine learning tasks efficiently on the underlying hardware.\nDouble Precision (Float64): Allocating 64 bits, double precision (e.g., 3.141592653589793) provides heightened accuracy, albeit demanding augmented memory and computational resources. In scientific computations, where precision is paramount, variables like π might be represented with Float64.\nSingle Precision (Float32): With 32 bits at its disposal, single precision (e.g., 3.1415927) strikes a balance between numerical accuracy and memory conservation. In ML, Float32 might be employed to store weights during training to maintain a reasonable level of precision.\nHalf Precision (Float16): Constrained to 16 bits, half precision (e.g., 3.14) curtails memory usage and can expedite computations, albeit sacrificing numerical accuracy and range. In ML, especially during inference on resource-constrained devices, Float16 might be utilized to reduce the model’s memory footprint.\nBfloat16: Brain Floating-Point Format or Bfloat16, also employs 16 bits but allocates them differently compared to FP16: 1 bit for the sign, 8 bits for the exponent (resulting in the same number range as in float32), and 7 bits for the fraction. This format, developed by Google, prioritizes a larger exponent range over precision, making it particularly useful in deep learning applications where the dynamic range is crucial.\nFigure 9.13 illustrates the differences between the three floating-point formats: Float32, Float16, and BFloat16.\n\n\n\n\n\n\nFigure 9.13: Three floating-point formats.\n\n\n\nInteger: Integer representations are made using 8, 4, and 2 bits. They are often used during the inference phase of neural networks, where the weights and activations of the model are quantized to these lower precisions. Integer representations are deterministic and offer significant speed and memory advantages over floating-point representations. For many inference tasks, especially on edge devices, the slight loss in accuracy due to quantization is often acceptable given the efficiency gains. An extreme form of integer numerics is for binary neural networks (BNNs), where weights and activations are constrained to one of two values: either +1 or -1.\n\n\n\n\n\n\n\n\nPrecision\nPros\nCons\n\n\n\n\nFP32 (Floating Point 32-bit)\nStandard precision used in most deep learning frameworks. High accuracy due to ample representational capacity. Well-suited for training.\nHigh memory usage. Slower inference times compared to quantized models. Higher energy consumption.\n\n\nFP16 (Floating Point 16-bit)\nReduces memory usage compared to FP32. Speeds up computations on hardware that supports FP16. Often used in mixed-precision training to balance speed and accuracy.\nLower representational capacity compared to FP32. Risk of numerical instability in some models or layers.\n\n\nINT8 (8-bit Integer)\nSignificantly reduced memory footprint compared to floating-point representations. Faster inference if hardware supports INT8 computations. Suitable for many post-training quantization scenarios.\nQuantization can lead to some accuracy loss. Requires careful calibration during quantization to minimize accuracy degradation.\n\n\nINT4 (4-bit Integer)\nEven lower memory usage than INT8. Further speed-up potential for inference.\nHigher risk of accuracy loss compared to INT8. Calibration during quantization becomes more critical.\n\n\nBinary\nMinimal memory footprint (only 1 bit per parameter). Extremely fast inference due to bitwise operations. Power efficient.\nSignificant accuracy drop for many tasks. Complex training dynamics due to extreme quantization.\n\n\nTernary\nLow memory usage but slightly more than binary. Offers a middle ground between representation and efficiency.\nAccuracy might still be lower than higher precision models. Training dynamics can be complex.\n\n\n\n\n\nNumeric Encoding and Storage\nNumeric encoding, the art of transmuting numbers into a computer-amenable format, and their subsequent storage are critical for computational efficiency. For instance, floating-point numbers might be encoded using the IEEE 754 standard, which apportions bits among sign, exponent, and fraction components, thereby enabling the representation of a vast array of values with a single format. There are a few new IEEE floating point formats that have been defined specifically for AI workloads:\n\nbfloat16- A 16-bit floating point format introduced by Google. It has 8 bits for exponent, 7 bits for mantissa and 1 bit for sign. Offers a reduced precision compromise between 32-bit float and 8-bit integers. Supported on many hardware accelerators.\nposit - A configurable format that can represent different levels of precision based on exponent bits. Aims to be more efficient than IEEE 754 binary floats. Has adjustable dynamic range and precision.\nFlexpoint - A format introduced by Intel that can dynamically adjust precision across layers or within a layer. Allows tuning precision to accuracy and hardware requirements.\nBF16ALT - A proposed 16-bit format by ARM as an alternative to bfloat16. Uses additional bit in exponent to prevent overflow/underflow.\nTF32 - Introduced by Nvidia for Ampere GPUs. Uses 10 bits for exponent instead of 8 bits like FP32. Improves model training performance while maintaining accuracy.\nFP8 - 8-bit floating point format that keeps 6 bits for mantissa and 2 bits for exponent. Enables better dynamic range than integers.\n\nThe key goals of these new formats are to provide lower precision alternatives to 32-bit floats for better computational efficiency and performance on AI accelerators while maintaining model accuracy. They offer different tradeoffs in terms of precision, range and implementation cost/complexity.\n\n\n\n9.3.2 Efficiency Benefits\nNumerical efficiency matters for machine learning workloads for a number of reasons:\nComputational Efficiency: High-precision computations (like FP32 or FP64) can be slow and resource-intensive. By reducing numeric precision, one can achieve faster computation times, especially on specialized hardware that supports lower precision.\nMemory Efficiency: Storage requirements decrease with reduced numeric precision. For instance, FP16 requires half the memory of FP32. This is crucial when deploying models to edge devices with limited memory or when working with very large models.\nPower Efficiency: Lower precision computations often consume less power, which is especially important for battery-operated devices.\nNoise Introduction: Interestingly, the noise introduced by using lower precision can sometimes act as a regularizer, helping to prevent overfitting in some models.\nHardware Acceleration: Many modern AI accelerators and GPUs are optimized for lower precision operations, leveraging the efficiency benefits of such numerics.\nEfficient numerics is not just about reducing the bit-width of numbers but understanding the trade-offs between accuracy and efficiency. As machine learning models become more pervasive, especially in real-world, resource-constrained environments, the focus on efficient numerics will continue to grow. By thoughtfully selecting and leveraging the appropriate numeric precision, one can achieve robust model performance while optimizing for speed, memory, and energy.\n\n\n9.3.3 Numeric Representation Nuances\nThere are a number of nuances with numerical representations for ML that require us to have an understanding of both the theoretical and practical aspects of numerics representation, as well as a keen awareness of the specific requirements and constraints of the application domain.\n\nMemory Usage\nThe memory footprint of ML models, particularly those of considerable complexity and depth, can be substantial, thereby posing a significant challenge in both training and deployment phases. For instance, a deep neural network with 100 million parameters, represented using Float32 (32 bits or 4 bytes per parameter), would necessitate approximately 400 MB of memory just for storing the model weights. This does not account for additional memory requirements during training for storing gradients, optimizer states, and forward pass caches, which can further amplify the memory usage, potentially straining the resources on certain hardware, especially edge devices with limited memory capacity.\n\n\nImpact on Model Parameters and Weights\nThe numeric representation casts a significant impact on the storage and computational requisites of ML model parameters and weights. For instance, a model utilizing Float64 for weights will demand double the memory and potentially increased computational time compared to a counterpart employing Float32. A weight matrix, for instance, with dimensions [1000, 1000] using Float64 would consume approximately 8MB of memory, whereas using Float32 would halve this to approximately 4MB.\n\n\nComputational Complexity\nNumerical precision directly impacts computational complexity, influencing the time and resources required to perform arithmetic operations. For example, operations using Float64 generally consume more computational resources than their Float32 or Float16 counterparts (see Figure 9.14). In the realm of ML, where models might need to process millions of operations (e.g., multiplications and additions in matrix operations during forward and backward passes), even minor differences in the computational complexity per operation can aggregate into a substantial impact on training and inference times. As shown in Figure 9.15, quantized models can be many times faster than their unquantized versions.\n\n\n\n\n\n\nFigure 9.14: Energy use by quantized operations. Credit: Mark Horowitz, Stanford University.\n\n\n\n\n\n\n\n\n\nFigure 9.15: Speed of three different models in normal and quantized form.\n\n\n\nIn addition to pure runtimes, there is also a concern over energy efficiency. Not all numerical computations are created equal from the underlying hardware standpoint. Some numerical operations are more energy efficient than others. For example, Figure 9.16 below shows that integer addition is much more energy efficient than integer multiplication.\n\n\n\n\n\n\nFigure 9.16: Energy use by quantized operations. Credit: Isscc (2014).\n\n\nIsscc. 2014. “Computing’s Energy Problem (and What We Can Do about It).” https://ieeexplore.ieee.org/document/6757323.\n\n\n\n\nHardware Compatibility\nEnsuring compatibility and optimized performance across diverse hardware platforms is another challenge in numerics representation. Different hardware, such as CPUs, GPUs, TPUs, and FPGAs, have varying capabilities and optimizations for handling different numeric precisions. For example, certain GPUs might be optimized for Float32 computations, while others might provide accelerations for Float16. Developing and optimizing ML models that can leverage the specific numerical capabilities of different hardware, while ensuring that the model maintains its accuracy and robustness, requires careful consideration and potentially additional development and testing efforts.\n\n\nPrecision and Accuracy Trade-offs\nThe trade-off between numerical precision and model accuracy is a nuanced challenge in numerics representation. Utilizing lower-precision numerics, such as Float16, might conserve memory and expedite computations but can also introduce issues like quantization error and reduced numerical range. For instance, training a model with Float16 might introduce challenges in representing very small gradient values, potentially impacting the convergence and stability of the training process. Furthermore, in certain applications, such as scientific simulations or financial computations, where high precision is paramount, the use of lower-precision numerics might not be permissible due to the risk of accruing significant errors.\n\n\nTrade-off Examples\nTo understand and appreciate the nuances let’s consider some use case examples. Through these we will realize that the choice of numeric representation is not merely a technical decision but a strategic one, influencing the model’s predictive acumen, its computational demands, and its deployability across diverse computational environments. In this section we will look at a couple of examples to better understand the trade-offs with numerics and how they tie to the real world.\n\nAutonomous Vehicles\nIn the domain of autonomous vehicles, ML models are employed to interpret sensor data and make real-time decisions. The models must process high-dimensional data from various sensors (e.g., LiDAR, cameras, radar) and execute numerous computations within a constrained time frame to ensure safe and responsive vehicle operation. So the trade-offs here would include:\n\nMemory Usage: Storing and processing high-resolution sensor data, especially in floating-point formats, can consume substantial memory.\nComputational Complexity: Real-time processing demands efficient computations, where higher-precision numerics might impede the timely execution of control actions.\n\n\n\nMobile Health Applications\nMobile health applications often utilize ML models for tasks like activity recognition, health monitoring, or predictive analytics, operating within the resource-constrained environment of mobile devices. The trade-offs here would include:\n\nPrecision and Accuracy Trade-offs: Employing lower-precision numerics to conserve resources might impact the accuracy of health predictions or anomaly detections, which could have significant implications for user health and safety.\nHardware Compatibility: Models need to be optimized for diverse mobile hardware, ensuring efficient operation across a wide range of devices with varying numerical computation capabilities.\n\n\n\nHigh-Frequency Trading (HFT) Systems\nHFT systems leverage ML models to make rapid trading decisions based on real-time market data. These systems demand extremely low-latency responses to capitalize on short-lived trading opportunities.\n\nComputational Complexity: The models must process and analyze vast streams of market data with minimal latency, where even slight delays, potentially introduced by higher-precision numerics, can result in missed opportunities.\nPrecision and Accuracy Trade-offs: Financial computations often demand high numerical precision to ensure accurate pricing and risk assessments, posing challenges in balancing computational efficiency and numerical accuracy.\n\n\n\nEdge-Based Surveillance Systems\nSurveillance systems deployed on edge devices, like security cameras, utilize ML models for tasks like object detection, activity recognition, and anomaly detection, often operating under stringent resource constraints.\n\nMemory Usage: Storing pre-trained models and processing video feeds in real-time demands efficient memory usage, which can be challenging with high-precision numerics.\nHardware Compatibility: Ensuring that models can operate efficiently on edge devices with varying hardware capabilities and optimizations for different numeric precisions is crucial for widespread deployment.\n\n\n\nScientific Simulations\nML models are increasingly being utilized in scientific simulations, such as climate modeling or molecular dynamics simulations, to enhance predictive capabilities and reduce computational demands.\n\nPrecision and Accuracy Trade-offs: Scientific simulations often require high numerical precision to ensure accurate and reliable results, which can conflict with the desire to reduce computational demands via lower-precision numerics.\nComputational Complexity: The models must manage and process complex, high-dimensional simulation data efficiently to ensure timely results and enable large-scale or long-duration simulations.\n\nThese examples illustrate diverse scenarios where the challenges of numerics representation in ML models are prominently manifested. Each system presents a unique set of requirements and constraints, necessitating tailored strategies and solutions to navigate the challenges of memory usage, computational complexity, precision-accuracy trade-offs, and hardware compatibility.\n\n\n\n\n9.3.4 Quantization\nQuantization is prevalent in various scientific and technological domains, and it essentially involves the mapping or constraining of a continuous set or range into a discrete counterpart to minimize the number of bits required.\n\nHistory\nHistorically, the idea of quantization is not novel and can be traced back to ancient times, particularly in the realm of music and astronomy. In music, the Greeks utilized a system of tetrachords, segmenting the continuous range of pitches into discrete notes, thereby quantizing musical sounds. In astronomy and physics, the concept of quantization was present in the discretized models of planetary orbits, as seen in the Ptolemaic and Copernican systems.\nDuring the 1800s, quantization-based discretization was used to approximate the calculation of integrals, and further used to investigate the impact of rounding errors on the integration result. With algorithms, Lloyd’s K-Means Algorithm is a classic example of quantization. However, the term “quantization” was firmly embedded in scientific literature with the advent of quantum mechanics in the early 20th century, where it was used to describe the phenomenon that certain physical properties, such as energy, exist only in discrete, quantized states. This principle was pivotal in explaining phenomena at the atomic and subatomic levels. In the digital age, quantization found its application in signal processing, where continuous signals are converted into a discrete digital form, and in numerical algorithms, where computations on real-valued numbers are performed with finite-precision arithmetic.\nExtending upon this second application and relevant to this section, it is used in computer science to optimize neural networks by reducing the precision of the network weights. Thus, quantization, as a concept, has been subtly woven into the tapestry of scientific and technological development, evolving and adapting to the needs and discoveries of various epochs.\n\n\nInitial Breakdown\nWe begin our foray into quantization with a brief analysis of one important use for quantization.\nIn signal processing, the continuous sine wave (shown in Figure 9.17) can be quantized into discrete values through a process known as sampling. This is a fundamental concept in digital signal processing and is crucial for converting analog signals (like the continuous sine wave) into a digital form that can be processed by computers. The sine wave is a prevalent example due to its periodic and smooth nature, making it a useful tool for explaining concepts like frequency, amplitude, phase, and, of course, quantization.\n\n\n\n\n\n\nFigure 9.17: Sine Wave.\n\n\n\nIn the quantized version shown in Figure 9.18, the continuous sine wave (Figure 9.17) is sampled at regular intervals (in this case, every \\(\\frac{\\pi}{4}\\) radians), and only these sampled values are represented in the digital version of the signal. The step-wise lines between the points show one way to represent the quantized signal in a piecewise-constant form. This is a simplified example of how analog-to-digital conversion works, where a continuous signal is mapped to a discrete set of values, enabling it to be represented and processed digitally.\n\n\n\n\n\n\nFigure 9.18: Quantized Sine Wave.\n\n\n\nReturning to the context of Machine Learning (ML), quantization refers to the process of constraining the possible values that numerical parameters (such as weights and biases) can take to a discrete set, thereby reducing the precision of the parameters and consequently, the model’s memory footprint. When properly implemented, quantization can reduce model size by up to 4x and improve inference latency and throughput by up to 2-3x. Figure 9.19 illustrates the impact that quantization has on different models’ sizes: for example, an Image Classification model like ResNet-v2 can be compressed from 180MB down to 45MB with 8-bit quantization. There is typically less than 1% loss in model accuracy from well tuned quantization. Accuracy can often be recovered by re-training the quantized model with quantization-aware training techniques. Therefore, this technique has emerged to be very important in deploying ML models to resource-constrained environments, such as mobile devices, IoT devices, and edge computing platforms, where computational resources (memory and processing power) are limited.\n\n\n\n\n\n\nFigure 9.19: Effect of quantization on model sizes. Credit: HarvardX.\n\n\n\nThere are several dimensions to quantization such as uniformity, stochasticity (or determinism), symmetry, granularity (across layers/channels/groups or even within channels), range calibration considerations (static vs dynamic), and fine-tuning methods (QAT, PTQ, ZSQ). We examine these below.\n\n\n\n9.3.5 Types\n\nUniform Quantization\nUniform quantization involves mapping continuous or high-precision values to a lower-precision representation using a uniform scale. This means that the interval between each possible quantized value is consistent. For example, if weights of a neural network layer are quantized to 8-bit integers (values between 0 and 255), a weight with a floating-point value of 0.56 might be mapped to an integer value of 143, assuming a linear mapping between the original and quantized scales. Due to its use of integer or fixed-point math pipelines, this form of quantization allows computation on the quantized domain without the need to dequantize beforehand.\nThe process for implementing uniform quantization starts with choosing a range of real numbers to be quantized. The next step is to select a quantization function and map the real values to the integers representable by the bit-width of the quantized representation. For instance, a popular choice for a quantization function is:\n\\[\nQ(r)=Int(r/S) - Z\n\\]\nwhere Q is the quantization operator, r is a real valued input (in our case, an activation or weight), S is a real valued scaling factor, and Z is an integer zero point. The Int function maps a real value to an integer value through a rounding operation. Through this function, we have effectively mapped real values r to some integer values, resulting in quantized levels which are uniformly spaced.\nWhen the need arises for practitioners to retrieve the original higher precision values, real values r can be recovered from quantized values through an operation known as dequantization. In the example above, this would mean performing the following operation on our quantized value:\n\\[\n\\bar{r} = S(Q(r) + Z)\n\\]\nAs discussed, some precision in the real value is lost by quantization. In this case, the recovered value \\(\\bar{r}\\) will not exactly match r due to the rounding operation. This is an important tradeoff to note; however, in many successful uses of quantization, the loss of precision can be negligible and the test accuracy remains high. Despite this, uniform quantization continues to be the current de-facto choice due to its simplicity and efficient mapping to hardware.\n\n\nNon-uniform Quantization\nNon-uniform quantization, on the other hand, does not maintain a consistent interval between quantized values. This approach might be used to allocate more possible discrete values in regions where the parameter values are more densely populated, thereby preserving more detail where it is most needed. For instance, in bell-shaped distributions of weights with long tails, a set of weights in a model predominantly lies within a certain range; thus, more quantization levels might be allocated to that range to preserve finer details, enabling us to better capture information. However, one major weakness of non-uniform quantization is that it requires dequantization before higher precision computations due to its non-uniformity, restricting its ability to accelerate computation compared to uniform quantization.\nTypically, a rule-based non-uniform quantization uses a logarithmic distribution of exponentially increasing steps and levels as opposed to linearly. Another popular branch lies in binary-code-based quantization where real number vectors are quantized into binary vectors with a scaling factor. Notably, there is no closed form solution for minimizing errors between the real value and non-uniformly quantized value, so most quantizations in this field rely on heuristic solutions. For instance, recent work by Xu et al. (2018) formulates non-uniform quantization as an optimization problem where the quantization steps/levels in quantizer Q are adjusted to minimize the difference between the original tensor and quantized counterpart.\n\nXu, Chen, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, and Hongbin Zha. 2018. “Alternating Multi-Bit Quantization for Recurrent Neural Networks.” In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=S19dR9x0b.\n\\[\n\\min_Q ||Q(r)-r||^2\n\\]\nFurthermore, learnable quantizers can be jointly trained with model parameters, and the quantization steps/levels are generally trained with iterative optimization or gradient descent. Additionally, clustering has been used to alleviate information loss from quantization. While capable of capturing higher levels of detail, non-uniform quantization schemes can be difficult to deploy efficiently on general computation hardware, making it less-preferred to methods which use uniform quantization.\n\n\n\n\n\n\nFigure 9.20: Quantization uniformity. Credit: Gholami et al. (2021).\n\n\n\n\n\nStochastic Quantization\nUnlike the two previous approaches which generate deterministic mappings, there is some work exploring the idea of stochastic quantization for quantization-aware training and reduced precision training. This approach maps floating numbers up or down with a probability associated to the magnitude of the weight update. The hope generated by high level intuition is that such a probabilistic approach may allow a neural network to explore more, as compared to deterministic quantization. Supposedly, enabling a stochastic rounding may allow neural networks to escape local optimums, thereby updating its parameters. Below are two example stochastic mapping functions:\n\n\n\n\n\n\n\nFigure 9.21: Integer vs Binary quantization functions.\n\n\n\n\n\nZero Shot Quantization\nZero-shot quantization refers to the process of converting a full-precision deep learning model directly into a low-precision, quantized model without the need for any retraining or fine-tuning on the quantized model. The primary advantage of this approach is its efficiency, as it eliminates the often time-consuming and resource-intensive process of retraining a model post-quantization. By leveraging techniques that anticipate and minimize quantization errors, zero-shot quantization aims to maintain the model’s original accuracy even after reducing its numerical precision. It is particularly useful for Machine Learning as a Service (MLaaS) providers aiming to expedite the deployment of their customer’s workloads without having to access their datasets.\n\n\n\n9.3.6 Calibration\nCalibration is the process of selecting the most effective clipping range [\\(\\alpha\\), \\(\\beta\\)] for weights and activations to be quantized to. For example, consider quantizing activations that originally have a floating-point range between -6 and 6 to 8-bit integers. If you just take the minimum and maximum possible 8-bit integer values (-128 to 127) as your quantization range, it might not be the most effective. Instead, calibration would involve passing a representative dataset then use this observed range for quantization.\nThere are many calibration methods but a few commonly used include:\n\nMax: Use the maximum absolute value seen during calibration. However, this method is susceptible to outlier data. Notice how in Figure 9.22, we have an outlier cluster around 2.1, while the rest are clustered around smaller values.\nEntropy: Use KL divergence to minimize information loss between the original floating-point values and values that could be represented by the quantized format. This is the default method used by TensorRT.\nPercentile: Set the range to a percentile of the distribution of absolute values seen during calibration. For example, 99% calibration would clip 1% of the largest magnitude values.\n\n\n\n\n\n\n\nFigure 9.22: Input activations to layer 3 in ResNet50. Credit: @Wu, Judd, and Isaev (2020).\n\n\n\nImportantly, the quality of calibration can make a difference between a quantized model that retains most of its accuracy and one that degrades significantly. Hence, it’s an essential step in the quantization process. When choosing a calibration range, there are two types: symmetric and asymmetric.\n\nSymmetric Quantization\nSymmetric quantization maps real values to a symmetrical clipping range centered around 0. This involves choosing a range [\\(\\alpha\\), \\(\\beta\\)] where \\(\\alpha = -\\beta\\). For example, one symmetrical range would be based on the min/max values of the real values such that: -\\(\\alpha = \\beta = max(abs(r_{max}), abs(r_{min}))\\).\nSymmetric clipping ranges are the most widely adopted in practice as they have the advantage of easier implementation. In particular, the mapping of zero to zero in the clipping range (sometimes called “zeroing out of the zero point”) can lead to reduction in computational cost during inference (Wu, Judd, and Isaev (2020)).\n\n\nAsymmetric Quantization\nAsymmetric quantization maps real values to an asymmetrical clipping range that isn’t necessarily centered around 0, as shown in Figure 9.23 on the right. It involves choosing a range [\\(\\alpha\\), \\(\\beta\\)] where \\(\\alpha \\neq -\\beta\\). For example, selecting a range based on the minimum and maximum real values, or where \\(\\alpha = r_{min}\\) and \\(\\beta = r_{max}\\), creates an asymmetric range. Typically, asymmetric quantization produces tighter clipping ranges compared to symmetric quantization, which is important when target weights and activations are imbalanced, e.g., the activation after the ReLU always has non-negative values. Despite producing tighter clipping ranges, asymmetric quantization is less preferred to symmetric quantization as it doesn’t always zero out the real value zero.\n\n\n\n\n\n\nFigure 9.23: Quantization (a)symmetry. Credit: Gholami et al. (2021).\n\n\n\n\n\nGranularity\nUpon deciding the type of clipping range, it is essential to tighten the range to allow a model to retain as much of its accuracy as possible. We’ll be taking a look at convolutional neural networks as our way of exploring methods that fine tune the granularity of clipping ranges for quantization. The input activation of a layer in our CNN undergoes convolution with multiple convolutional filters. Every convolutional filter can possess a unique range of values. Notice how in Figure 9.24, the range for Filter1 is much smaller than that for Filter 3. Consequently, one distinguishing feature of quantization approaches is the precision with which the clipping range [α,β] is determined for the weights.\n\n\n\n\n\n\nFigure 9.24: Quantization granularity: variable ranges. Credit: Gholami et al. (2021).\n\n\n\n\nLayerwise Quantization: This approach determines the clipping range by considering all of the weights in the convolutional filters of a layer. Then, the same clipping range is used for all convolutional filters. It’s the simplest to implement, and, as such, it often results in sub-optimal accuracy due the wide variety of differing ranges between filters. For example, a convolutional kernel with a narrower range of parameters loses its quantization resolution due to another kernel in the same layer having a wider range.\nGroupwise Quantization: This approach groups different channels inside a layer to calculate the clipping range. This method can be helpful when the distribution of parameters across a single convolution/activation varies a lot. In practice, this method was useful in Q-BERT (Shen et al. 2020) for quantizing Transformer (Vaswani et al. 2017) models that consist of fully-connected attention layers. The downside with this approach comes with the extra cost of accounting for different scaling factors.\nChannelwise Quantization: This popular method uses a fixed range for each convolutional filter that is independent of other channels. Because each channel is assigned a dedicated scaling factor, this method ensures a higher quantization resolution and often results in higher accuracy.\nSub-channelwise Quantization: Taking channelwise quantization to the extreme, this method determines the clipping range with respect to any groups of parameters in a convolution or fully-connected layer. It may result in considerable overhead since different scaling factors need to be taken into account when processing a single convolution or fully-connected layer.\n\n\nShen, Sheng, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. “Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (05): 8815–21. https://doi.org/10.1609/aaai.v34i05.6409.\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.\nOf these, channelwise quantization is the current standard used for quantizing convolutional kernels, since it enables the adjustment of clipping ranges for each individual kernel with negligible overhead.\n\n\nStatic and Dynamic Quantization\nAfter determining the type and granularity of the clipping range, practitioners must decide when ranges are determined in their range calibration algorithms. There are two approaches to quantizing activations: static quantization and dynamic quantization.\nStatic quantization is the most frequently used approach. In this, the clipping range is pre-calculated and static during inference. It does not add any computational overhead, but, consequently, results in lower accuracy as compared to dynamic quantization. A popular method of implementing this is to run a series of calibration inputs to compute the typical range of activations [Quantization and training of neural networks for efficient integer-arithmetic-only inference, Dyadic neural network quantization].\nDynamic quantization is an alternative approach which dynamically calculates the range for each activation map during runtime. The approach requires real-time computations which might have a very high overhead. By doing this, dynamic quantization often achieves the highest accuracy as the range is calculated specifically for each input.\nBetween the two, calculating the range dynamically usually is very costly, so most practitioners will often use static quantization instead.\n\n\n\n9.3.7 Techniques\nThe two prevailing techniques for quantizing models are Post Training Quantization and Quantization-Aware Training.\nPost Training Quantization - Post-training quantization (PTQ) is a quantization technique where the model is quantized after it has been trained. The model is trained in floating point and then weights and activations are quantized as a post-processing step. This is the simplest approach and does not require access to the training data. Unlike Quantization-Aware Training (QAT), PTQ sets weight and activation quantization parameters directly, making it low-overhead and suitable for limited or unlabeled data situations. However, not readjusting the weights after quantizing, especially in low-precision quantization can lead to very different behavior and thus lower accuracy. To tackle this, techniques like bias correction, equalizing weight ranges, and adaptive rounding methods have been developed. PTQ can also be applied in zero-shot scenarios, where no training or testing data are available. This method has been made even more efficient to benefit compute- and memory- intensive large language models. Recently, SmoothQuant, a training-free, accuracy-preserving, and general-purpose PTQ solution which enables 8-bit weight, 8-bit activation quantization for LLMs, has been developed, demonstrating up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy (Xiao et al. (2022)).\nIn PTQ, a pretrained model undergoes a calibration process, as shown in Figure 9.25. Calibration involves using a separate dataset known as calibration data, a specific subset of the training data reserved for quantization to help find the appropriate clipping ranges and scaling factors.\n\n\n\n\n\n\nFigure 9.25: Post-Training Quantization and calibration. Credit: Gholami et al. (2021).\n\n\n\nQuantization-Aware Training - Quantization-aware training (QAT) is a fine-tuning of the PTQ model. The model is trained aware of quantization, allowing it to adjust for quantization effects. This produces better accuracy with quantized inference. Quantizing a trained neural network model with methods such as PTQ introduces perturbations that can deviate the model from its original convergence point. For instance, Krishnamoorthi showed that even with per-channel quantization, networks like MobileNet do not reach baseline accuracy with int8 Post Training Quantization (PTQ) and require Quantization-Aware Training (QAT) (Krishnamoorthi (2018)).To address this, QAT retrains the model with quantized parameters, employing forward and backward passes in floating point but quantizing parameters after each gradient update. Handling the non-differentiable quantization operator is crucial; a widely used method is the Straight Through Estimator (STE), approximating the rounding operation as an identity function. While other methods and variations exist, STE remains the most commonly used due to its practical effectiveness. In QAT, a pretrained model is quantized and then finetuned using training data to adjust parameters and recover accuracy degradation, as shown in Figure 9.26. The calibration process is often conducted in parallel with the finetuning process for QAT.\n\n\n\n\n\n\nFigure 9.26: Quantization-Aware Training. Credit: Gholami et al. (2021).\n\n\nGholami, Dong Kim, Mahoney Yao, and Keutzer. 2021. “A Survey of Quantization Methods for Efficient Neural Network Inference).” ArXiv Preprint. https://arxiv.org/abs/2103.13630.\n\n\nQuantization-Aware Training serves as a natural extension of Post-Training Quantization. Following the initial quantization performed by PTQ, QAT is used to further refine and fine-tune the quantized parameters - see how in Figure 9.27, the PTQ model undergoes an additional step, QAT. It involves a retraining process where the model is exposed to additional training iterations using the original data. This dynamic training approach allows the model to adapt and adjust its parameters, compensating for the performance degradation caused by quantization.\n\n\n\n\n\n\nFigure 9.27: PTQ and QAT. Credit: “The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training” (n.d.).\n\n\n“The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training.” n.d. https://deci.ai/quantization-and-quantization-aware-training/.\n\n\nFigure 9.28 shows the relative accuracy of different models after PTQ and QAT. In almost all cases, QAT yields a better accuracy than PTQ. Consider for example EfficientNet b0. After PTQ, the accuracy drops from 76.85% to 72.06%. But when we apply QAT, the accuracy rebounds to 76.95% (with even a slight improvement over the original accuracy).\n\n\n\n\n\n\nFigure 9.28: Relative accuracies of PTQ and QAT. Credit: Wu, Judd, and Isaev (2020).\n\n\n\n\n\n\n\n\n\n\n\n\nFeature/Technique\nPost Training Quantization\nQuantization-Aware Training\nDynamic Quantization\n\n\n\n\nPros\n\n\n\n\n\nSimplicity\n✓\n✗\n✗\n\n\nAccuracy Preservation\n✗\n✓\n✓\n\n\nAdaptability\n✗\n✗\n✓\n\n\nOptimized Performance\n✗\n✓\nPotentially\n\n\nCons\n\n\n\n\n\nAccuracy Degradation\n✓\n✗\nPotentially\n\n\nComputational Overhead\n✗\n✓\n✓\n\n\nImplementation Complexity\n✗\n✓\n✓\n\n\nTradeoffs\n\n\n\n\n\nSpeed vs. Accuracy\n✓\n✗\n✗\n\n\nAccuracy vs. Cost\n✗\n✓\n✗\n\n\nAdaptability vs. Overhead\n✗\n✗\n✓\n\n\n\n\n\n9.3.8 Weights vs. Activations\nWeight Quantization: Involves converting the continuous or high-precision weights of a model to lower-precision, such as converting Float32 weights to quantized INT8 (integer) weights - in Figure 9.29, weight quantization is taking place in the second step (red squares) when we multiply the inputs. This reduces the model size, thereby reducing the memory required to store the model and the computational resources needed to perform inference. For example, consider a weight matrix in a neural network layer with Float32 weights as [0.215, -1.432, 0.902, …]. Through weight quantization, these might be mapped to INT8 values like [27, -183, 115, …], significantly reducing the memory required to store them.\n\n\n\n\n\n\nFigure 9.29: Weight and activation quantization. Credit: HarvardX.\n\n\n\nActivation Quantization: Involves quantizing the activation values (outputs of layers) during model inference. This can reduce the computational resources required during inference, but it introduces additional challenges in maintaining model accuracy due to the reduced precision of intermediate computations. For example, in a convolutional neural network (CNN), the activation maps (feature maps) produced by convolutional layers, originally in Float32, might be quantized to INT8 during inference to accelerate computation, especially on hardware optimized for integer arithmetic. Additionally, recent work has explored the use of Activation-aware Weight Quantization for LLM compression and acceleration, which involves protecting only 1% of the most important salient weights by observing the activations not weights (Lin et al. (2023)).\n\n\n9.3.9 Trade-offs\nQuantization invariably introduces a trade-off between model size/performance and accuracy. While it significantly reduces the memory footprint and can accelerate inference, especially on hardware optimized for low-precision arithmetic, the reduced precision can degrade model accuracy.\nModel Size: A model with weights represented as Float32 being quantized to INT8 can theoretically reduce the model size by a factor of 4, enabling it to be deployed on devices with limited memory. The model size of large language models is developing at a faster pace than the GPU memory in recent years, leading to a big gap between the supply and demand for memory. Figure 9.30 illustrates the recent trend of the widening gap between model size (red line) and acceleartor memory (yellow line). Quantization and model compression techniques can help bridge the gap\n\n\n\n\n\n\nFigure 9.30: Model size vs. accelerator memory. Credit: Xiao et al. (2022).\n\n\nXiao, Seznec Lin, Demouth Wu, and Han. 2022. “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.” ArXiv Preprint. https://arxiv.org/abs/2211.10438.\n\n\nInference Speed: Quantization can also accelerate inference, as lower-precision arithmetic is computationally less expensive. For example, certain hardware accelerators, like Google’s Edge TPU, are optimized for INT8 arithmetic and can perform inference significantly faster with INT8 quantized models compared to their floating-point counterparts. The reduction in memory from quantization helps reduce the amount of data transmission, saving up memory and speeding the process. Figure 9.31 compares the increase in throughput and the reduction in bandwidth memory for different data type on the NVIDIA Turing GPU.\n\n\n\n\n\n\nFigure 9.31: Benefits of lower precision data types. Credit: Wu, Judd, and Isaev (2020).\n\n\nWu, Zhang Judd, and Micikevicius Isaev. 2020. “Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation).” ArXiv Preprint. https://arxiv.org/abs/2004.09602.\n\n\nAccuracy: The reduction in numerical precision post-quantization can lead to a degradation in model accuracy, which might be acceptable in certain applications (e.g., image classification) but not in others (e.g., medical diagnosis). Therefore, post-quantization, the model typically requires re-calibration or fine-tuning to mitigate accuracy loss. Furthermore, recent work has explored the use of Activation-aware Weight Quantization (Lin et al. (2023)) which is based on the observation that protecting only 1% of salient weights can greatly reduce quantization error.\n\n\n9.3.10 Quantization and Pruning\nPruning and quantization work well together, and it’s been found that pruning doesn’t hinder quantization. In fact, pruning can help reduce quantization error. Intuitively, this is due to pruning reducing the number of weights to quantize, thereby reducing the accumulated error from quantization. For example, an unpruned AlexNet has 60 million weights to quantize whereas a pruned AlexNet only has 6.7 million weights to quantize. This significant drop in weights helps reduce the error between quantizing the unpruned AlexNet vs. the pruned AlexNet. Furthermore, recent work has found that quantization-aware pruning generates more computationally efficient models than either pruning or quantization alone; It typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization (Hawks et al. (2021)).\n\n\n\n\n\n\nFigure 9.32: Accuracy vs. compression rate under different compression methods. Credit: Han, Mao, and Dally (2015).\n\n\nHan, Song, Huizi Mao, and William J Dally. 2015. “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.” arXiv Preprint arXiv:1510.00149.\n\n\n\n\n9.3.11 Edge-aware Quantization\nQuantization not only reduces model size but also enables faster computations and draws less power, making it vital to edge development. Edge devices typically have tight resource constraints with compute, memory, and power, which are impossible to meet for many of the deep NN models of today. Furthermore, edge processors do not support floating point operations, making integer quantization particularly important for chips like GAP-8, a RISC-V SoC for edge inference with a dedicated CNN accelerator, which only support integer arithmetic..\nOne hardware platform utilizing quantization is the ARM Cortex-M group of 32-bit RISC ARM processor cores. They leverage fixed-point quantization with power of two scaling factors so that quantization and dequantization can be efficiently done by bit shifting. Additionally, Google Edge TPUs, Google’s emerging solution for running inference at the edge, is designed for small, low-powered devices and can only support 8-bit arithmetic. Many complex neural network models that could only be deployed on servers due to their high computational needs can now be run on edge devices thanks to recent advancements (e.g. quantization methods) in edge computing field.\nIn addition to being an indispensable technique for many edge processors, quantization has also brought noteworthy improvements to non-edge processors such as encouraging such processors to meet the Service Level Agreement (SLA) requirements such as 99th percentile latency.\nThus, quantization combined with efficient low-precision logic and dedicated deep learning accelerators, has been one crucial driving force for the evolution of such edge processors.\nVideo 9.1 is a lecture on quantization and the different quantization methods.\n\n\n\n\n\n\nVideo 9.1: Quantization", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -968,7 +968,7 @@ "section": "9.4 Efficient Hardware Implementation", "text": "9.4 Efficient Hardware Implementation\nEfficient hardware implementation transcends the selection of suitable components; it requires a holistic understanding of how software will interact with underlying architectures. The essence of achieving peak performance in TinyML applications lies not only in refining algorithms to hardware but also in ensuring that the hardware is strategically tailored to support these algorithms. This synergy between hardware and software is crucial. As we delve deeper into the intricacies of efficient hardware implementation, the significance of a co-design approach, where hardware and software are developed in tandem, becomes increasingly evident. This section provides an overview of the techniques of how hardware and the interactions between hardware and software can be optimized to improve models performance.\n\n9.4.1 Hardware-Aware Neural Architecture Search\nFocusing only on the accuracy when performing Neural Architecture Search leads to models that are exponentially complex and require increasing memory and compute. This has lead to hardware constraints limiting the exploitation of the deep learning models at their full potential. Manually designing the architecture of the model is even harder when considering the hardware variety and limitations. This has lead to the creation of Hardware-aware Neural Architecture Search that incorporate the hardware contractions into their search and optimize the search space for a specific hardware and accuracy. HW-NAS can be categorized based how it optimizes for hardware. We will briefly explore these categories and leave links to related papers for the interested reader.\n\nSingle Target, Fixed Platfrom Configuration\nThe goal here is to find the best architecture in terms of accuracy and hardware efficiency for one fixed target hardware. For a specific hardware, the Arduino Nicla Vision for example, this category of HW-NAS will look for the architecture that optimizes accuracy, latency, energy consumption, etc.\n\nHardware-aware Search Strategy\nHere, the search is a multi-objective optimization problem, where both the accuracy and hardware cost guide the searching algorithm to find the most efficient architecture (Tan et al. 2019; Cai, Zhu, and Han 2019; B. Wu et al. 2019).\n\nTan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. “MnasNet: Platform-aware Neural Architecture Search for Mobile.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2820–28. IEEE. https://doi.org/10.1109/cvpr.2019.00293.\n\nCai, Han, Ligeng Zhu, and Song Han. 2019. “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware.” In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HylVB3AqYm.\n\nWu, Bichen, Kurt Keutzer, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, and Yangqing Jia. 2019. “FBNet: Hardware-aware Efficient ConvNet Design via Differentiable Neural Architecture Search.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10734–42. IEEE. https://doi.org/10.1109/cvpr.2019.01099.\n\n\nHardware-aware Search Space\nHere, the search space is restricted to the architectures that perform well on the specific hardware. This can be achieved by either measuring the operators (Conv operator, Pool operator, …) performance, or define a set of rules that limit the search space. (L. L. Zhang et al. 2020)\n\nZhang, Li Lyna, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, and Yunxin Liu. 2020. “Fast Hardware-Aware Neural Architecture Search.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw50498.2020.00354.\n\n\n\nSingle Target, Multiple Platform Configurations\nSome hardwares may have different configurations. For example, FPGAs have Configurable Logic Blocks (CLBs) that can be configured by the firmware. This method allows for the HW-NAS to explore different configurations. (Hu et al. 2023; Ho Yoon et al. 2012)\n\nHo Yoon, Jung, Hyung-Suk Jung, Min Hwan Lee, Gun Hwan Kim, Seul Ji Song, Jun Yeong Seok, Kyung Jean Yoon, et al. 2012. “Frontiers in Electronic Materials.” Wiley. https://doi.org/10.1002/9783527667703.ch67.\n\n\nMultiple Targets\nThis category aims at optimizing a single model for multiple hardwares. This can be helpful for mobile devices development as it can optimize to different phones models. (Chu et al. 2021; Hu et al. 2023)\n\nChu, Grace, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton, Pieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, and Andrew Howard. 2021. “Discovering Multi-Hardware Mobile Models via Architecture Search.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3022–31. IEEE. https://doi.org/10.1109/cvprw53098.2021.00337.\n\nHu, Yang, Jie Jiang, Lifu Zhang, Yunfeng Shi, and Jian Shi. 2023. “Halide Perovskite Semiconductors.” Wiley. https://doi.org/10.1002/9783527829026.ch13.\n\n\nExamples of Hardware-Aware Neural Architecture Search\n\nTinyNAS\nTinyNAS adopts a two stage approach to finding an optimal architecture for model with the constraints of the specific microcontroller in mind.\nFirst, TinyNAS generate multiple search spaces by varying the input resolution of the model, and the number of channels of the layers of the model. Then, TinyNAS chooses a search space based on the FLOPs (Floating Point Operations Per Second) of each search space. Spaces with a higher probability of containiung architectures with a large number of FLOPs yields models with higher accuracies - compare the red line vs. the black line in Figure 9.33. Since a higher number FLOPs means the model has a higher computational capacity, the model is more likely to have a higher accuracy.\nThen, TinyNAS performs a search operation on the chosen space to find the optimal architecture for the specific constraints of the microcontroller. (J. Lin et al. 2020)\n\n\n\n\n\n\nFigure 9.33: Search spaces accuracy. Credit: J. Lin et al. (2020).\n\n\nLin, Ji, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. “MCUNet: Tiny Deep Learning on IoT Devices.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html.\n\n\n\n\n\nTopology-Aware NAS\nFocuses on creating and optimizing a search space that aligns with the hardware topology of the device. (T. Zhang et al. 2020)\n\nZhang, Tunhou, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai Helen Li, and Yiran Chen. 2020. “AutoShrink: A Topology-Aware NAS for Discovering Efficient Neural Architecture.” In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, the Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, 6829–36. AAAI Press. https://aaai.org/ojs/index.php/AAAI/article/view/6163.\n\n\n\n9.4.2 Challenges of Hardware-Aware Neural Architecture Search\nWhile HW-NAS carries high potential for finding optimal architectures for TinyML, it comes with some challenges. Hardware Metrics like latency, energy consumption and hardware utilization are harder to evaluate than the metrics of accuracy or loss. They often require specilized tools for precise measurements. Moreover, adding all these metrics leads to a much bigger search space. This leads to HW-NAS being time-consuming and expensive. It has to be applied to every hardware for optimal results, moreover, meaning that if one needs to deploy the model on multiple devices, the search has to be conducted multiple times and will result in different models, unless optimizing for all of them which means less accuracy. Finally, hardware changes frequently, and HW-NAS may need to be conducted on each version.\n\n\n9.4.3 Kernel Optimizations\nKernel Optimizations are modifications made to the kernel to enhance the performance of machine learning models onf resource-constrained devices. We will separate kernel optimizations into two types.\n\nGeneral Kernel Optimizations\nThese are kernel optimizations that all devices can benefit from. They provide technics to convert the code to more efficient instructions.\n\nLoop unrolling\nInstead of having a loop with loop control (incrementing the loop counter, checking the loop termination condition) the loop can be unrolled and the overhead of loop control can be omitted. This may also provide additional opportunities for parallelism that may not be possible with the loop structure. This can be particularly beneficial for tight loops, where the boy of the loop is a small number of instructions with a lot of iterations.\n\n\nBlocking\nBlocking is used to make memory access patterns more efficient. If we have three computations the first and the last need to access cache A and the second needs to access cache B, blocking blocks the first two computations together to reduce the number of memory reads needed.\n\n\nTiling\nSimilarly to blocking, tiling divides data and computation into chunks, but extends beyond cache improvements. Tiling creates independent partitions of computation that can be run in parallel, which can result in significant performance improvements.:\n\n\nOptimized Kernel Libraries\nThis comprises developing optimized kernels that take full advantage of a specific hardware. One example is the CMSIS-NN library, which is a collection of efficient neural network kernels developed to optimize the performance and minimize the memory footprint of models on Arm Cortex-M processors, which are common on IoT edge devices. The kernel leverage multiple hardware capabilities of Cortex-M processors like Single Instruction Multple Data (SIMD), Floating Point Units (FPUs) and M-Profile Vector Extensions (MVE). These optimization make common operations like matrix multiplications more efficient, boosting the performance of model operations on Cortex-M processors. (Lai, Suda, and Chandra 2018)\n\nLai, Liangzhen, Naveen Suda, and Vikas Chandra. 2018. “CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-m CPUs.” https://arxiv.org/abs/1801.06601.\n\n\n\n\n9.4.4 Compute-in-Memory (CiM)\nThis is one example of Algorithm-Hardware Co-design. CiM is a computing paradigm that performs computation within memory. Therefore, CiM architectures allow for operations to be performed directly on the stored data, without the need to shuttle data back and forth between separate processing and memory units. This design paradigm is particularly beneficial in scenarios where data movement is a primary source of energy consumption and latency, such as in TinyML applications on edge devices. Figure 9.34 is one example of using CiM in TinyML: keyword spotting requires an always-on process that looks for certain wake words (such as ‘Hey, Siri’). Given the resource-intensive nature of this task, integrating CiM for the always-on keyword detection model can enhance efficiency.\nThrough algorithm-hardware co-design, the algorithms can be optimized to leverage the unique characteristics of CiM architectures, and conversely, the CiM hardware can be customized or configured to better support the computational requirements and characteristics of the algorithms. This is achieved by using the analog properties of memory cells, such as addition and multiplication in DRAM. (Zhou et al. 2021)\n\n\n\n\n\n\nFigure 9.34: CiM for keyword spotting. Credit: Zhou et al. (2021).\n\n\nZhou, Chuteng, Fernando Garcia Redondo, Julian Büchel, Irem Boybat, Xavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian, Manuel Le Gallo, and Paul N. Whatmough. 2021. “AnalogNets: Ml-hw Co-Design of Noise-Robust TinyML Models and Always-on Analog Compute-in-Memory Accelerator.” https://arxiv.org/abs/2111.06503.\n\n\n\n\n9.4.5 Memory Access Optimization\nDifferent devices may have different memory hierarchies. Optimizing for the specific memory hierarchy in the specific hardware can lead to great performance improvements by reducing the costly operations of reading and writing to memory. Dataflow optimization can be achieved by optimizing for reusing data within a single layer and across multiple layers. This dataflow optimization can be tailored to the specific memory hierarchy of the hardware, which can lead to greater benefits than general optimizations for different hardwares.\n\nLeveraging Sparsity\nPruning is a fundamental approach to compress models to make them compatible with resource constrained devices. This results in sparse models where a lot of weights are 0’s. Therefore, leveraging this sparsity can lead to significant improvements in performance. Tools were created to achieve exactly this. RAMAN, is a sparseTinyML accelerator designed for inference on edge devices. RAMAN overlap input and output activations on the same memory space, reducing storage requirements by up to 50%. (Krishna et al. 2023)\n\nKrishna, Adithya, Srikanth Rohit Nudurupati, Chandana D G, Pritesh Dwivedi, André van Schaik, Mahesh Mehendale, and Chetan Singh Thakur. 2023. “RAMAN: A Re-Configurable and Sparse TinyML Accelerator for Inference on Edge.” https://arxiv.org/abs/2306.06493.\n\n\nOptimization Frameworks\nOptimization Frameworks have been introduced to exploit the specific capabilities of the hardware to accelerate the software. One example of such a framework is hls4ml - Figure 9.35 provides an overview of the framework’s workflow. This open-source software-hardware co-design workflow aids in interpreting and translating machine learning algorithms for implementation with both FPGA and ASIC technologies. Features such as network optimization, new Python APIs, quantization-aware pruning, and end-to-end FPGA workflows are embedded into the hls4ml framework, leveraging parallel processing units, memory hierarchies, and specialized instruction sets to optimize models for edge hardware. Moreover, hls4ml is capable of translating machine learning algorithms directly into FPGA firmware.\n\n\n\n\n\n\nFigure 9.35: hls4ml framework workflow. Credit: Fahim et al. (2021).\n\n\nFahim, Farah, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, et al. 2021. “Hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices.” https://arxiv.org/abs/2103.05579.\n\n\nOne other framework for FPGAs that focuses on a holistic approach is CFU Playground (Prakash et al. 2023)\n\nPrakash, Shvetank, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan V. Green, Pete Warden, Tim Ansell, and Vijay Janapa Reddi. 2023. “CFU Playground: Full-stack Open-Source Framework for Tiny Machine Learning (TinyML) Acceleration on FPGAs.” In 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Vol. abs/2201.01863. IEEE. https://doi.org/10.1109/ispass57527.2023.00024.\n\n\nHardware Built Around Software\nIn a contrasting approach, hardware can be custom-designed around software requirements to optimize the performance for a specific application. This paradigm creates specialized hardware to better adapt to the specifics of the software, thus reducing computational overhead and improving operational efficiency. One example of this approach is a voice-recognition application by (Kwon and Park 2021). The paper proposes a structure wherein preprocessing operations, traditionally handled by software, are allocated to custom-designed hardware. This technique was achieved by introducing resistor-transistor logic to an inter-integrated circuit sound module for windowing and audio raw data acquisition in the voice-recognition application. Consequently, this offloading of preprocessing operations led to a reduction in computational load on the software, showcasing a practical application of building hardware around software to enhance the efficiency and performance.\n\n\n\n\n\n\nFigure 9.36: Delegating data processing to an FPGA. Credit: Kwon and Park (2021).\n\n\nKwon, Jisu, and Daejin Park. 2021. “Hardware/Software Co-Design for TinyML Voice-Recognition Application on Resource Frugal Edge Devices.” Applied Sciences 11 (22): 11073. https://doi.org/10.3390/app112211073.\n\n\n\n\nSplitNets\nSplitNets were introduced in the context of Head-Mounted systems. They distribute the Deep Neural Networks (DNNs) workload among camera sensors and an aggregator. This is particularly compelling the in context of TinyML. The SplitNet framework is a split-aware NAS to find the optimal neural network architecture to achieve good accuracy, split the model among the sensors and the aggregator, and minimize the communication between the sensors and the aggregator. Figure 9.37 demonstrates how SplitNets (in red) achieves higher accuracy for lower latency (running on ImageNet) than different approaches, such as running the DNN on-sensor (All-on-sensor; in green) or on mobile (All-on-aggregator; in blue). Minimal communication is important in TinyML where memory is highly constrained, this way the sensors conduct some of the processing on their chips and then they send only the necessary information to the aggregator. When testing on ImageNet, SplitNets were able to reduce the latency by one order of magnitude on head-mounted devices. This can be helpful when the sensor has its own chip. (Dong et al. 2022)\n\n\n\n\n\n\nFigure 9.37: SplitNets vs other approaches. Credit: Dong et al. (2022).\n\n\nDong, Xin, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, and Ziyun Li. 2022. “SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12549–59. IEEE. https://doi.org/10.1109/cvpr52688.2022.01223.\n\n\n\n\nHardware Specific Data Augmentation\nEach edge device may possess unique sensor characteristics, leading to specific noise patterns that can impact model performance. One example is audio data, where variations stemming from the choice of microphone are prevalent. Applications such as Keyword Spotting can experience substantial enhancements by incorporating data recorded from devices similar to those intended for deployment. Fine-tuning of existing models can be employed to adapt the data precisely to the sensor’s distinctive characteristics.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -979,7 +979,7 @@ "section": "9.5 Software and Framework Support", "text": "9.5 Software and Framework Support\nWhile all of the aforementioned techniques like pruning, quantization, and efficient numerics are well-known, they would remain impractical and inaccessible without extensive software support. For example, directly quantizing weights and activations in a model would require manually modifying the model definition and inserting quantization operations throughout. Similarly, directly pruning model weights requires manipulating weight tensors. Such tedious approaches become infeasible at scale.\nWithout the extensive software innovation across frameworks, optimization tools and hardware integration, most of these techniques would remain theoretical or only viable to experts. Without framework APIs and automation to simplify applying these optimizations, they would not see adoption. Software support makes them accessible to general practitioners and unlocks real-world benefits. In addition, issues such as hyperparameter tuning for pruning, managing the trade-off between model size and accuracy, and ensuring compatibility with target devices pose hurdles that developers must navigate.\n\n9.5.1 Built-in Optimization APIs\nMajor machine learning frameworks like TensorFlow, PyTorch, and MXNet provide libraries and APIs to allow common model optimization techniques to be applied without requiring custom implementations. For example, TensorFlow offers the TensorFlow Model Optimization Toolkit which contains modules like:\n\nquantization - Applies quantization-aware training to convert floating point models to lower precision like int8 with minimal accuracy loss. Handles weight and activation quantization.\nsparsity - Provides pruning APIs to induce sparsity and remove unnecessary connections in models like neural networks. Can prune weights, layers, etc.\nclustering - Supports model compression by clustering weights into groups for higher compression rates.\n\nThese APIs allow users to enable optimization techniques like quantization and pruning without directly modifying model code. Parameters like target sparsity rates, quantization bit-widths etc. can be configured. Similarly, PyTorch provides torch.quantization for converting models to lower precision representations. TorchTensor and TorchModule form the base classes for quantization support. It also offers torch.nn.utils.prune for built-in pruning of models. MXNet offers gluon.contrib layers that add quantization capabilities like fixed point rounding and stochastic rounding of weights/activations during training. This allows quantization to be readily included in gluon models.\nThe core benefit of built-in optimizations is that users can apply them without re-implementing complex techniques. This makes optimized models accessible to a broad range of practitioners. It also ensures best practices are followed by building on research and experience implementing the methods. As new optimizations emerge, frameworks strive to provide native support and APIs where possible to further lower the barrier to efficient ML. The availability of these tools is key to widespread adoption.\n\n\n9.5.2 Automated Optimization Tools\nAutomated optimization tools provided by frameworks can analyze models and automatically apply optimizations like quantization, pruning, and operator fusion to make the process easier and accessible without excessive manual tuning. In effect, this builds on top of the previous section. For example, TensorFlow provides the TensorFlow Model Optimization Toolkit which contains modules like:\n\nQuantizationAwareTraining - Automatically quantizes weights and activations in a model to lower precision like UINT8 or INT8 with minimal accuracy loss. It inserts fake quantization nodes during training so that the model can learn to be quantization-friendly.\nPruning - Automatically removes unnecessary connections in a model based on analysis of weight importance. Can prune entire filters in convolutional layers or attention heads in transformers. Handles iterative re-training to recover any accuracy loss.\nGraphOptimizer - Applies graph optimizations like operator fusion to consolidate operations and reduce execution latency, especially for inference. In Figure 9.38, you can see the original (Source Graph) on the left, and how its operations are transformed (consolidated) on the right. Notice how Block1 in Source Graph has 3 separate steps (Convolution, BiasAdd, and Activation), which are then consolidated together in Block1 on Optimized Graph.\n\n\n\n\n\n\n\nFigure 9.38: GraphOptimizer. Credit: Wess et al. (2020).\n\n\nWess, Matthias, Matvey Ivanov, Christoph Unger, and Anvesh Nookala. 2020. “ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models.” IEEE. https://doi.org/10.1109/ACCESS.2020.3047259.\n\n\nThese automated modules only require the user to provide the original floating point model, and handle the end-to-end optimization pipeline including any re-training to regain accuracy. Other frameworks like PyTorch also offer increasing automation support, for example through torch.quantization.quantize_dynamic. Automated optimization makes efficient ML accessible to practitioners without optimization expertise.\n\n\n9.5.3 Hardware Optimization Libraries\nHardware libraries like TensorRT and TensorFlow XLA allow models to be highly optimized for target hardware through techniques that we discussed earlier.\nQuantization: For example, TensorRT and TensorFlow Lite both support quantization of models during conversion to their format. This provides speedups on mobile SoCs with INT8/INT4 support.\nKernel Optimization: For instance, TensorRT does auto-tuning to optimize CUDA kernels based on the GPU architecture for each layer in the model graph. This extracts maximum throughput.\nOperator Fusion: TensorFlow XLA does aggressive fusion to create optimized binary for TPUs. On mobile, frameworks like NCNN also support fused operators. ` Hardware-Specific Code: Libraries are used to generate optimized binary code specialized for the target hardware. For example, TensorRT uses Nvidia CUDA/cuDNN libraries which are hand-tuned for each GPU architecture. This hardware-specific coding is key for performance. On TinyML devices, this can mean assembly code optimized for a Cortex M4 CPU for example. Vendors provide CMSIS-NN and other libraries.\nData Layout Optimizations - We can efficiently leverage memory hierarchy of hardware like cache and registers through techniques like tensor/weight rearrangement, tiling, and reuse. For example, TensorFlow XLA optimizes buffer layouts to maximize TPU utilization. This helps any memory constrained systems.\nProfiling-based Tuning - We can use profiling tools to identify bottlenecks. For example, adjust kernel fusion levels based on latency profiling. On mobile SoCs, vendors like Qualcomm provide profilers in SNPE to find optimization opportunities in CNNs. This data-driven approach is important for performance.\nBy integrating framework models with these hardware libraries through conversion and execution pipelines, ML developers can achieve significant speedups and efficiency gains from low-level optimizations tailored to the target hardware. The tight integration between software and hardware is key to enabling performant deployment of ML applications, especially on mobile and TinyML devices.\n\n\n9.5.4 Visualizing Optimizations\nImplementing model optimization techniques without visibility into the effects on the model can be challenging. Dedicated tooling or visualization tools can provide critical and useful insight into model changes and helps track the optimization process. Let’s consider the optimizations we considered earlier, such as pruning for sparsity and quantization.\n\nSparsity (ADD SOME LINKS INTO HERE)\nFor example, consider sparsity optimizations. Sparsity visualization tools can provide critical insights into pruned models by mapping out exactly which weights have been removed. For example, sparsity heat maps can use color gradients to indicate the percentage of weights pruned in each layer of a neural network. Layers with higher percentages pruned appear darker (see Figure 9.39). This identifies which layers have been simplified the most by pruning (Souza 2020).\n\n\n\n\n\n\nFigure 9.39: Sparse network heat map. Credit: Numenta.\n\n\n\nTrend plots can also track sparsity over successive pruning rounds - they may show initial rapid pruning followed by more gradual incremental increases. Tracking the current global sparsity along with statistics like average, minimum, and maximum sparsity per-layer in tables or plots provides an overview of the model composition. For a sample convolutional network, these tools could reveal that the first convolution layer is pruned 20% while the final classifier layer is pruned 70% given its redundancy. The global model sparsity may increase from 10% after initial pruning to 40% after five rounds.\nBy making sparsity data visually accessible, practitioners can better understand exactly how their model is being optimized and which areas are being impacted. The visibility enables them to fine-tune and control the pruning process for a given architecture.\nSparsity visualization turns pruning into a transparent technique instead of a black-box operation.\n\n\nQuantization\nConverting models to lower numeric precisions through quantization introduces errors that can impact model accuracy if not properly tracked and addressed. Visualizing quantization error distributions provides valuable insights into the effects of reduced precision numerics applied to different parts of a model. For this, histograms of the quantization errors for weights and activations can be generated. These histograms can reveal the shape of the error distribution - whether they resemble a Gaussian distribution or contain significant outliers and spikes. Figure 9.40 shows the distributions of different quantization methods. Large outliers may indicate issues with particular layers handling the quantization. Comparing the histograms across layers highlights any problem areas standing out with abnormally high errors.\n\n\n\n\n\n\nFigure 9.40: Quantization errors. Credit: Kuzmin et al. (2022).\n\n\nKuzmin, Andrey, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, and Tijmen Blankevoort. 2022. “FP8 Quantization: The Power of the Exponent.” https://arxiv.org/abs/2208.09225.\n\n\nActivation visualizations are also important to detect overflow issues. By color mapping the activations before and after quantization, any values pushed outside the intended ranges become visible. This reveals saturation and truncation issues that could skew the information flowing through the model. Detecting these errors allows recalibrating activations to prevent loss of information (Mandal 2022). Figure 9.41 is a color mapping of the AlexNet convolutional kernels.\n\n\n\n\n\n\nFigure 9.41: Color mapping of activations. Credit: Krizhevsky, Sutskever, and Hinton (2012).\n\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” Edited by F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger 25. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.\n\n\nOther techniques, such as tracking the overall mean square quantization error at each step of the quantization-aware training process identifies fluctuations and divergences. Sudden spikes in the tracking plot may indicate points where quantization is disrupting the model training. Monitoring this metric builds intuition on model behavior under quantization. Together these techniques turn quantization into a transparent process. The empirical insights enable practitioners to properly assess quantization effects. They pinpoint areas of the model architecture or training process to recalibrate based on observed quantization issues. This helps achieve numerically stable and accurate quantized models.\nProviding this data enables practitioners to properly assess the impact of quantization and identify potential problem areas of the model to recalibrate or redesign to be more quantization friendly. This empirical analysis builds intuition on achieving optimal quantization.\nVisualization tools can provide insights that help practitioners better understand the effects of optimizations on their models. The visibility enables correcting issues early before accuracy or performance is impacted significantly. It also aids applying optimizations more effectively for specific models. These optimization analytics help build intuition when transitioning models to more efficient representations.\n\n\n\n9.5.5 Model Conversion and Deployment\nOnce models have been successfully optimized in frameworks like TensorFlow and PyTorch, specialized model conversion and deployment platforms are needed to bridge the gap to running them on target devices.\nTensorFlow Lite - TensorFlow’s platform to convert models to a lightweight format optimized for mobile, embedded and edge devices. Supports optimizations like quantization, kernel fusion, and stripping away unused ops. Models can be executed using optimized TensorFlow Lite kernels on device hardware. Critical for mobile and TinyML deployment.\nONNX Runtime - Performs model conversion and inference for models in the open ONNX model format. Provides optimized kernels, supports hardware accelerators like GPUs, and cross-platform deployment from cloud to edge. Allows framework-agnostic deployment. Figure 9.42 is an ONNX interoperability map, including major popular frameworks.\n\n\n\n\n\n\nFigure 9.42: Interoperablily of ONNX. Credit: TowardsDataScience.\n\n\n\nPyTorch Mobile - Enables PyTorch models to be run on iOS and Android by converting to mobile-optimized representations. Provides efficient mobile implementations of ops like convolution and special functions optimized for mobile hardware.\nThese platforms integrate with hardware drivers, operating systems, and accelerator libraries on devices to execute models efficiently using hardware optimization. They also offload operations to dedicated ML accelerators where present. The availability of these proven, robust deployment platforms bridges the gap between optimizing models in frameworks and actual deployment to billions of devices. They allow users to focus on model development rather than building custom mobile runtimes. Continued innovation to support new hardware and optimizations in these platforms is key to widespread ML optimizations.\nBy providing these optimized deployment pipelines, the entire workflow from training to device deployment can leverage model optimizations to deliver performant ML applications. This end-to-end software infrastructure has helped drive the adoption of on-device ML.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -990,7 +990,7 @@ "section": "9.6 Conclusion", "text": "9.6 Conclusion\nIn this chapter we’ve discussed model optimization across the software-hardware span. We dove deep into efficient model representation, where we covered the nuances of structured and unstructured pruning and other techniques for model compression such as knowledge distillation and matrix and tensor decomposition. We also dove briefly into edge-specific model design at the parameter and model architecture level, exploring topics like edge-specific models and hardware-aware NAS.\nWe then explored efficient numerics representations, where we covered the basics of numerics, numeric encodings and storage, benefits of efficient numerics, and the nuances of numeric representation with memory usage, computational complexity, hardware compatibility, and tradeoff scenarios. We finished by honing in on an efficient numerics staple: quantization, where we examined its history, calibration, techniques, and interaction with pruning.\nFinally, we looked at how we can make optimizations specific to the hardware we have. We explored how we can find model architectures tailored to the hardware, make optimizations in the kernel to better handle the model, and frameworks built to make the most use out of the hardware. We also looked at how we can go the other way around and build hardware around our specific software and talked about splitting networks to run on multiple processors available on the edge device.\nBy understanding the full picture of the degrees of freedom within model optimization both away and close to the hardware and the tradeoffs to consider when implementing these methods, practitioners can develop a more thoughtful pipeline for compressing their workloads onto edge devices.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -999,9 +999,9 @@ "href": "contents/optimizations/optimizations.html#sec-model-optimizations-resource", "title": "9  Model Optimizations", "section": "9.7 Resources", - "text": "9.7 Resources\nHere is a curated list of resources to support both students and instructors in their learning and teaching journey. We are continuously working on expanding this collection and will be adding new exercises in the near future.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nQuantization:\n\nQuantization: Part 1.\nQuantization: Part 2.\nPost-Training Quantization (PTQ).\nQuantization-Aware Training (QAT).\n\nPruning:\n\nPruning: Part 1.\nPruning: Part 2.\n\nKnowledge Distillation.\nClustering.\nNeural Architecture Search (NAS):\n\nNAS overview.\nNAS: Part 1.\nNAS: Part 2.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 9.1\nExercise 9.2\nExercise 9.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "9.7 Resources\nHere is a curated list of resources to support both students and instructors in their learning and teaching journey. We are continuously working on expanding this collection and will be adding new exercises in the near future.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nQuantization:\n\nQuantization: Part 1.\nQuantization: Part 2.\nPost-Training Quantization (PTQ).\nQuantization-Aware Training (QAT).\n\nPruning:\n\nPruning: Part 1.\nPruning: Part 2.\n\nKnowledge Distillation.\nClustering.\nNeural Architecture Search (NAS):\n\nNAS overview.\nNAS: Part 1.\nNAS: Part 2.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 9.1\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 9.1\nExercise 9.2\nExercise 9.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Training", "9  Model Optimizations" ] }, @@ -1012,7 +1012,7 @@ "section": "", "text": "10.1 Introduction\nMachine learning has emerged as a transformative technology across many industries, enabling systems to learn and improve from data. There is a growing demand for embedded ML solutions to deploy machine learning capabilities in real-world environments - where models are built into edge devices like smartphones, home appliances, and autonomous vehicles. However, these edge devices have limited computing resources compared to data center servers.\nSpecialized hardware acceleration enables high-performance machine learning on resource-constrained edge devices. Hardware acceleration refers to using custom silicon chips and architectures to offload compute-intensive ML operations from the main processor. In neural networks, the most intensive computations are the matrix multiplications during inference. Hardware accelerators can optimize these matrix operations, providing 10-100x speedups over general-purpose CPUs. This acceleration unlocks the ability to run advanced neural network models on devices with size, weight, and power constraints in real-time.\nThis chapter overviews hardware acceleration techniques for embedded machine learning and their design tradeoffs. Its goal is to equip readers with an essential background in embedded ML acceleration. This will enable informed hardware selection and software optimization to develop high-performance machine learning capabilities on edge devices.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1023,7 +1023,7 @@ "section": "10.2 Background and Basics", "text": "10.2 Background and Basics\n\n10.2.1 Historical Background\nThe origins of hardware acceleration date back to the 1960s, with the advent of floating point math co-processors to offload calculations from the main CPU. One early example was the Intel 8087 chip released in 1980 to accelerate floating point operations for the 8086 processor. This established the practice of using specialized processors to handle math-intensive workloads efficiently.\nIn the 1990s, the first graphics processing units (GPUs) emerged to process graphics pipelines for rendering and gaming rapidly. Nvidia’s GeForce 256 in 1999 was one of the earliest programmable GPUs capable of running custom software algorithms. GPUs exemplify domain-specific fixed-function accelerators and evolve into parallel programmable accelerators.\nIn the 2000s, GPUs were applied to general-purpose computing under GPGPU. Their high memory bandwidth and computational throughput made them well-suited for math-intensive workloads. This included breakthroughs in using GPUs to accelerate training of deep learning models such as AlexNet in 2012.\nIn recent years, Google’s Tensor Processing Units (TPUs) represent customized ASICs specifically architected for matrix multiplication in deep learning. During inference, their optimized tensor cores achieve higher TeraOPS/watt than CPUs or GPUs. Ongoing innovation includes model compression techniques like pruning and quantization to fit larger neural networks on edge devices.\nThis evolution demonstrates how hardware acceleration has focused on solving compute-intensive bottlenecks, from floating point math to graphics to matrix multiplication for ML. Understanding this history provides a crucial context for specialized AI accelerators today.\n\n\n10.2.2 The Need for Acceleration\nThe evolution of hardware acceleration is closely tied to the broader history of computing. In the early decades, chip design was governed by Moore’s Law and Dennard Scaling, which observed that the number of transistors on an integrated circuit doubled yearly, and their performance (speed) increased as transistors became smaller. At the same time, power density (power per unit area) remains constant. These two laws were held through the single-core era. Figure 10.1 shows the trends of different microprocessor metrics. As the figure denotes, Dennard Scaling fails around the mid-2000s; notice how the clock speed (frequency) remains almost constant even as the number of transistors keeps increasing.\nHowever, as Patterson and Hennessy (2016) describes, technological constraints eventually forced a transition to the multicore era, with chips containing multiple processing cores to deliver performance gains. Power limitations prevented further scaling, which led to “dark silicon” (Dark Silicon), where not all chip areas could be simultaneously active (Xiu 2019).\n\nPatterson, David A, and John L Hennessy. 2016. Computer Organization and Design ARM Edition: The Hardware Software Interface. Morgan kaufmann.\n\nXiu, Liming. 2019. “Time Moore: Exploiting Moore’s Law from the Perspective of Time.” IEEE Solid-State Circuits Mag. 11 (1): 39–55. https://doi.org/10.1109/mssc.2018.2882285.\nThe concept of dark silicon emerged as a consequence of these constraints. “Dark silicon” refers to portions of the chip that cannot be powered simultaneously due to thermal and power limitations. Essentially, as the density of transistors increased, the proportion of the chip that could be actively used without overheating or exceeding power budgets shrank.\nThis phenomenon meant that while chips had more transistors, not all could be operational simultaneously, limiting potential performance gains. This power crisis necessitated a shift to the accelerator era, with specialized hardware units tailored for specific tasks to maximize efficiency. The explosion in AI workloads further drove demand for customized accelerators. Enabling factors included new programming languages, software tools, and manufacturing advances.\n\n\n\n\n\n\nFigure 10.1: Microprocessor trends. Credit: Karl Rupp.\n\n\n\nFundamentally, hardware accelerators are evaluated on performance, power, and silicon area (PPA)—the nature of the target application—whether memory-bound or compute-bound—heavily influences the design. For example, memory-bound workloads demand high bandwidth and low latency access, while compute-bound applications require maximal computational throughput.\n\n\n10.2.3 General Principles\nThe design of specialized hardware accelerators involves navigating complex tradeoffs between performance, power efficiency, silicon area, and workload-specific optimizations. This section outlines core considerations and methodologies for achieving an optimal balance based on application requirements and hardware constraints.\n\nPerformance Within Power Budgets\nPerformance refers to the throughput of computational work per unit of time, commonly measured in floating point operations per second (FLOPS) or frames per second (FPS). Higher performance enables completing more work, but power consumption rises with activity.\nHardware accelerators aim to maximize performance within set power budgets. This requires careful balancing of parallelism, the chip’s clock frequency, the operating voltage, workload optimization, and other techniques to maximize operations per watt.\n\nPerformance = Throughput * Efficiency\nThroughput ~= Parallelism * Clock Frequency\nEfficiency = Operations / Watt\n\nFor example, GPUs achieve high throughput via massively parallel architectures. However, their efficiency is lower than that of customized application-specific integrated circuits (ASICs) like Google’s TPU, which optimize for a specific workload.\n\n\nManaging Silicon Area and Costs\nChip area directly impacts manufacturing cost. Larger die sizes require more materials, lower yields, and higher defect rates. Mulit-die packages help scale designs but add packaging complexity. Silicon area depends on:\n\nComputational resources - e.g., number of cores, memory, caches\nManufacturing process node - smaller transistors enable higher density\nProgramming model - programmed accelerators require more flexibility\n\nAccelerator design involves squeezing maximum performance within area constraints. Techniques like pruning and compression help fit larger models on the chip.\n\n\nWorkload-Specific Optimizations\nThe target workload dictates optimal accelerator architectures. Some of the key considerations include:\n\nMemory vs Compute boundedness: Memory-bound workloads require more memory bandwidth, while compute-bound apps need arithmetic throughput.\nData locality: Data movement should be minimized for efficiency. Near-compute memory helps.\nBit-level operations: Low precision datatypes like INT8/INT4 optimize compute density.\nData parallelism: Multiple replicated compute units allow parallel execution.\nPipelining: Overlapped execution of operations increases throughput.\n\nUnderstanding workload characteristics enables customized acceleration. For example, convolutional neural networks use sliding window operations optimally mapped to spatial arrays of processing elements.\nBy navigating these architectural tradeoffs, hardware accelerators can deliver massive performance gains and enable emerging applications in AI, graphics, scientific computing, and other domains.\n\n\nSustainable Hardware Design\nIn recent years, AI sustainability has become a pressing concern driven by two key factors - the exploding scale of AI workloads and their associated energy consumption.\nFirst, the size of AI models and datasets has rapidly grown. For example, based on OpenAI’s AI computing trends, the amount of computing used to train state-of-the-art models doubles every 3.5 months. This exponential growth requires massive computational resources in data centers.\nSecond, the energy usage of AI training and inference presents sustainability challenges. Data centers running AI applications consume substantial energy, contributing to high carbon emissions. It’s estimated that training a large AI model can have a carbon footprint of 626,000 pounds of CO2 equivalent, almost 5 times the lifetime emissions of an average car.\nAs a result, AI research and practice must prioritize energy efficiency and carbon impact alongside accuracy. There is an increasing focus on model efficiency, data center design, hardware optimization, and other solutions to improve sustainability. Striking a balance between AI progress and environmental responsibility has emerged as a key consideration and an area of active research across the field.\nThe scale of AI systems is expected to keep growing. Developing sustainable AI is crucial for managing the environmental footprint and enabling widespread beneficial deployment of this transformative technology.\nWe will learn about Sustainable AI in a later chapter, where we will discuss it in more detail.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1034,7 +1034,7 @@ "section": "10.3 Accelerator Types", "text": "10.3 Accelerator Types\nHardware accelerators can take on many forms. They can exist as a widget (like the Neural Engine in the Apple M1 chip) or as entire chips specially designed to perform certain tasks very well. This section will examine processors for machine learning workloads along the spectrum from highly specialized ASICs to more general-purpose CPUs. We first focus on custom hardware purpose-built for AI to understand the most extreme optimizations possible when design constraints are removed. This establishes a ceiling for performance and efficiency.\nWe then progressively consider more programmable and adaptable architectures, discussing GPUs and FPGAs. These make tradeoffs in customization to maintain flexibility. Finally, we cover general-purpose CPUs that sacrifice optimizations for a particular workload in exchange for versatile programmability across applications.\nBy structuring the analysis along this spectrum, we aim to illustrate the fundamental tradeoffs between utilization, efficiency, programmability, and flexibility in accelerator design. The optimal balance point depends on the constraints and requirements of the target application. This spectrum perspective provides a framework for reasoning about hardware choices for machine learning and the capabilities required at each level of specialization.\nFigure 10.2 illustrates the complex interplay between flexibility, performance, functional diversity, and area of architecture design. Notice how the ASIC is on the bottom-right corner, with minimal area, flexibility, and power consumption and maximal performance, due to its highly specialized application-specific nature. A key tradeoff is functinoal diversity vs performance: general purpose architechtures can serve diverse applications but their application performance is degraded as compared to more customized architectures.\nThe progression begins with the most specialized option, ASICs purpose-built for AI, to ground our understanding in the maximum possible optimizations before expanding to more generalizable architectures. This structured approach aims to elucidate the accelerator design space.\n\n\n\n\n\n\nFigure 10.2: Design tradeoffs. Credit: El-Rayis (2014).\n\n\nEl-Rayis, A. O. 2014. “Reconfigurable Architectures for the Next Generation of Mobile Device Telecommunications Systems.” : https://www.researchgate.net/publication/292608967.\n\n\n\n10.3.1 Application-Specific Integrated Circuits (ASICs)\nAn Application-Specific Integrated Circuit (ASIC) is a type of integrated circuit (IC) that is custom-designed for a specific application or workload rather than for general-purpose use. Unlike CPUs and GPUs, ASICs do not support multiple applications or workloads. Rather, they are optimized to perform a single task extremely efficiently. The Google TPU is an example of an ASIC.\nASICs achieve this efficiency by tailoring every aspect of the chip design - the underlying logic gates, electronic components, architecture, memory, I/O, and manufacturing process - specifically for the target application. This level of customization allows removing any unnecessary logic or functionality required for general computation. The result is an IC that maximizes performance and power efficiency on the desired workload. The efficiency gains from application-specific hardware are so substantial that these software-centric firms dedicate enormous engineering resources to designing customized ASICs.\nThe rise of more complex machine learning algorithms has made the performance advantages enabled by tailored hardware acceleration a key competitive differentiator, even for companies traditionally concentrated on software engineering. ASICs have become a high-priority investment for major cloud providers aiming to offer faster AI computation.\n\nAdvantages\nDue to their customized nature, ASICs provide significant benefits over general-purpose processors like CPUs and GPUs. The key advantages include the following.\n\nMaximized Performance and Efficiency\nThe most fundamental advantage of ASICs is maximizing performance and power efficiency by customizing the hardware architecture specifically for the target application. Every transistor and design aspect is optimized for the desired workload - no unnecessary logic or overhead is needed to support generic computation.\nFor example, Google’s Tensor Processing Units (TPUs) contain architectures tailored exactly for the matrix multiplication operations used in neural networks. To design the TPU ASICs, Google’s engineering teams need to define the chip specifications clearly, write the architecture description using Hardware Description Languages like Verilog, synthesize the design to map it to hardware components, and carefully place-and-route transistors and wires based on the fabrication process design rules. This complex design process, known as very-large-scale integration (VLSI), allows them to build an optimized IC for machine learning workloads.\nAs a result, TPU ASICs achieve over an order of magnitude higher efficiency in operations per watt than general-purpose GPUs on ML workloads by maximizing performance and minimizing power consumption through a full-stack custom hardware design.\n\n\nSpecialized On-Chip Memory\nASICs incorporate on-chip SRAM and caches specifically optimized to feed data to the computational units. For example, Apple’s M1 system-on-a-chip contains special low-latency SRAM to accelerate the performance of its Neural Engine machine learning hardware. Large local memory with high bandwidth enables data to be kept close to the processing elements. This provides tremendous speed advantages compared to off-chip DRAM access, which can be up to 100x slower.\nData locality and optimizing memory hierarchy are crucial for high throughput and low power. Below is a table, “Numbers Everyone Should Know,” from Jeff Dean.\n\n\n\nOperation\nLatency\nNotes\n\n\n\n\nL1 cache reference\n0.5 ns\n\n\n\nBranch mispredict\n5 ns\n\n\n\nL2 cache reference\n7 ns\n\n\n\nMutex lock/unlock\n25 ns\n\n\n\nMain memory reference\n100 ns\n\n\n\nCompress 1K bytes with Zippy\n3,000 ns\n3 us\n\n\nSend 1 KB bytes over 1 Gbps network\n10,000 ns\n10 us\n\n\nRead 4 KB randomly from SSD\n150,000 ns\n150 us\n\n\nRead 1 MB sequentially from memory\n250,000 ns\n250 us\n\n\nRound trip within same datacenter\n500,000 ns\n0.5 ms\n\n\nRead 1 MB sequentially from SSD\n1,000,000 ns\n1 ms\n\n\nDisk seek\n10,000,000 ns\n10 ms\n\n\nRead 1 MB sequentially from disk\n20,000,000 ns\n20 ms\n\n\nSend packet CA->Netherlands->CA\n150,000,000 ns\n150 ms\n\n\n\n\n\nCustom Datatypes and Operations\nUnlike general-purpose processors, ASICs can be designed to natively support custom datatypes like INT4 or bfloat16, which are widely used in ML models. For instance, Nvidia’s Ampere GPU architecture has dedicated bfloat16 Tensor Cores to accelerate AI workloads. Low-precision datatypes enable higher arithmetic density and performance. ASICs can also directly incorporate non-standard operations common in ML algorithms as primitive operations - for example, natively supporting activation functions like ReLU makes execution more efficient. Please refer to the Efficient Numeric Representations chapter for additional details.\n\n\nHigh Parallelism\nASIC architectures can leverage higher parallelism tuned for the target workload versus general-purpose CPUs or GPUs. More computational units tailored for the application mean more operations execute simultaneously. Highly parallel ASICs achieve tremendous throughput for data parallel workloads like neural network inference.\n\n\nAdvanced Process Nodes\nCutting-edge manufacturing processes allow more transistors to be packed into smaller die areas, increasing density. ASICs designed specifically for high-volume applications can better amortize the costs of cutting-edge process nodes.\n\n\n\nDisadvantages\n\nLong Design Timelines\nThe engineering process of designing and validating an ASIC can take 2-3 years. Synthesizing the architecture using hardware description languages, taping out the chip layout, and fabricating the silicon on advanced process nodes involve long development cycles. For example, to tape out a 7nm chip, teams need to define specifications carefully, write the architecture in HDL, synthesize the logic gates, place components, route all interconnections, and finalize the layout to send for fabrication. This very large-scale integration (VLSI) flow means ASIC design and manufacturing can traditionally take 2-5 years.\nThere are a few key reasons why the long design timelines of ASICs, often 2-3 years, can be challenging for machine learning workloads:\n\nML algorithms evolve rapidly: New model architectures, training techniques, and network optimizations are constantly emerging. For example, Transformers became hugely popular in NLP last few years. When an ASIC finishes tapeout, the optimal architecture for a workload may have changed.\nDatasets grow quickly: ASICs designed for certain model sizes or datatypes can become undersized relative to demand. For instance, natural language models are scaling exponentially with more data and parameters. A chip designed for BERT might not accommodate GPT-3.\nML applications change frequently: The industry focus shifts between computer vision, speech, NLP, recommender systems, etc. An ASIC optimized for image classification may have less relevance in a few years.\nFaster design cycles with GPUs/FPGAs: Programmable accelerators like GPUs can adapt much quicker by upgrading software libraries and frameworks. New algorithms can be deployed without hardware changes.\nTime-to-market needs: Getting a competitive edge in ML requires rapidly experimenting with and deploying new ideas. Waiting several years for an ASIC is different from fast iteration.\n\nThe pace of innovation in ML needs to be better matched to the multi-year timescale for ASIC development. Significant engineering efforts are required to extend ASIC lifespan through modular architectures, process scaling, model compression, and other techniques. However, the rapid evolution of ML makes fixed-function hardware challenging.\n\n\nHigh Non-Recurring Engineering Costs\nThe fixed costs of taking an ASIC from design to high-volume manufacturing can be very capital-intensive, often tens of millions of dollars. Photomask fabrication for taping out chips in advanced process nodes, packaging, and one-time engineering efforts is expensive. For instance, a 7nm chip tape-out alone could cost millions. The high non-recurring engineering (NRE) investment narrows ASIC viability to high-volume production use cases where the upfront cost can be amortized.\n\n\nComplex Integration and Programming\nASICs require extensive software integration work, including drivers, compilers, OS support, and debugging tools. They also need expertise in electrical and thermal packaging. Additionally, efficiently programming ASIC architectures can involve challenges like workload partitioning and scheduling across many parallel units. The customized nature necessitates significant integration efforts to turn raw hardware into fully operational accelerators.\nWhile ASICs provide massive efficiency gains on target applications by tailoring every aspect of the hardware design to one specific task, their fixed nature results in tradeoffs in flexibility and development costs compared to programmable accelerators, which must be weighed based on the application.\n\n\n\n\n10.3.2 Field-Programmable Gate Arrays (FPGAs)\nFPGAs are programmable integrated circuits that can be reconfigured for different applications. Their customizable nature provides advantages for accelerating AI algorithms compared to fixed ASICs or inflexible GPUs. While Google, Meta, and NVIDIA are considering putting ASICs in data centers, Microsoft deployed FPGAs in its data centers (Putnam et al. 2014) in 2011 to efficiently serve diverse data center workloads.\n\nAdvantages\nFPGAs provide several benefits over GPUs and ASICs for accelerating machine learning workloads.\n\nFlexibility Through Reconfigurable Fabric\nThe key advantage of FPGAs is the ability to reconfigure the underlying fabric to implement custom architectures optimized for different models, unlike fixed-function ASICs. For example, quant trading firms use FPGAs to accelerate their algorithms because they change frequently, and the low NRE cost of FPGAs is more viable than tapping out new ASICs. Figure 10.3 contains a table comparing three different FPGAs.\n\n\n\n\n\n\nFigure 10.3: Comparison of FPGAs. Credit: Gwennap (n.d.).\n\n\nGwennap, Linley. n.d. “Certus-NX Innovates General-Purpose FPGAs.”\n\n\nFPGAs comprise basic building blocks - configurable logic blocks, RAM blocks, and interconnects. Vendors provide a base amount of these resources, and engineers program the chips by compiling HDL code into bitstreams that rearrange the fabric into different configurations. This makes FPGAs adaptable as algorithms evolve.\nWhile FPGAs may not achieve the utmost performance and efficiency of workload-specific ASICs, their programmability provides more flexibility as algorithms change. This adaptability makes FPGAs a compelling choice for accelerating evolving machine learning applications. Microsoft has deployed FPGAs in its Azure data centers for machine learning workloads to serve diverse applications instead of ASICs. The programmability enables optimization across changing ML models.\n\n\nCustomized Parallelism and Pipelining\nFPGA architectures can leverage spatial parallelism and pipelining by tailoring the hardware design to mirror the parallelism in ML models. For example, Intel’s HARPv2 FPGA platform splits the layers of an MNIST convolutional network across separate processing elements to maximize throughput. Unique parallel patterns like tree ensemble evaluations are also possible on FPGAs. Deep pipelines with optimized buffering and dataflow can be customized to each model’s structure and datatypes. This level of tailored parallelism and pipelining is not feasible on GPUs.\n\n\nLow Latency On-Chip Memory\nLarge amounts of high-bandwidth on-chip memory enable localized storage for weights and activations. For instance, Xilinx Versal FPGAs contain 32MB of low-latency RAM blocks and dual-channel DDR4 interfaces for external memory. Bringing memory physically closer to the compute units reduces access latency. This provides significant speed advantages over GPUs that traverse PCIe or other system buses to reach off-chip GDDR6 memory.\n\n\nNative Support for Low Precision\nA key advantage of FPGAs is the ability to natively implement any bit width for arithmetic units, such as INT4 or bfloat16, used in quantized ML models. For example, Intel’s Stratix 10 NX FPGAs have dedicated INT8 cores that can achieve up to 143 INT8 TOPS at ~1 TOPS/W Intel Stratix 10 NX FPGA. Lower bit widths increase arithmetic density and performance. FPGAs can even support mixed precision or dynamic precision tuning at runtime.\n\n\n\nDisadvantages\n\nLower Peak Throughput than ASICs\nFPGAs cannot match the raw throughput numbers of ASICs customized for a specific model and precision. The overheads of the reconfigurable fabric compared to fixed function hardware result in lower peak performance. For example, the TPU v5e pods allow up to 256 chips to be connected with more than 100 petaOps of INT8 performance, while FPGAs can offer up to 143 INT8 TOPS or 286 INT4 TOPS Intel Stratix 10 NX FPGA.\nThis is because FPGAs comprise basic building blocks—configurable logic blocks, RAM blocks, and interconnects. Vendors provide a set amount of these resources. To program FPGAs, engineers write HDL code and compile it into bitstreams that rearrange the fabric, which has inherent overheads versus an ASIC purpose-built for one computation.\n\n\nProgramming Complexity\nTo optimize FPGA performance, engineers must program the architectures in low-level hardware description languages like Verilog or VHDL. This requires hardware design expertise and longer development cycles than higher-level software frameworks like TensorFlow. Maximizing utilization can be challenging despite advances in high-level synthesis from C/C++.\n\n\nReconfiguration Overheads\nChanging FPGA configurations requires reloading a new bitstream, which has considerable latency and storage size costs. For example, partial reconfiguration on Xilinx FPGAs can take 100s of milliseconds. This makes dynamically swapping architectures in real-time infeasible. The bitstream storage also consumes on-chip memory.\n\n\nDiminishing Gains on Advanced Nodes\nWhile smaller process nodes greatly benefit ASICs, they provide fewer advantages for FPGAs. At 7nm and below, effects like process variation, thermal constraints, and aging disproportionately impact FPGA performance. The overheads of the configurable fabric also diminish gains compared to fixed-function ASICs.\n\n\nCase Study\nFPGAs have found widespread application in various fields, including medical imaging, robotics, and finance, where they excel in handling computationally intensive machine learning tasks. In medical imaging, an illustrative example is the application of FPGAs for brain tumor segmentation, a traditionally time-consuming and error-prone process. For instance, Xiong et al. developed a quantized segmentation accelerator, which they retrained using the BraTS19 and BraTS20 datasets. Their work yielded remarkable results, achieving over 5x and 44x performance improvements and 11x and 82x energy efficiency gains compared to GPU and CPU implementations, respectively (Xiong et al. 2021).\n\nXiong, Siyu, Guoqing Wu, Xitian Fan, Xuan Feng, Zhongcheng Huang, Wei Cao, Xuegong Zhou, et al. 2021. “MRI-Based Brain Tumor Segmentation Using FPGA-Accelerated Neural Network.” BMC Bioinf. 22 (1): 421. https://doi.org/10.1186/s12859-021-04347-6.\n\n\n\n\n10.3.3 Digital Signal Processors (DSPs)\nThe first digital signal processor core was built in 1948 by Texas Instruments (The Evolution of Audio DSPs). Traditionally, DSPs would have logic to directly access digital/audio data in memory, perform an arithmetic operation (multiply-add-accumulate-MAC was one of the most common operations), and then write the result back to memory. The DSP would include specialized analog components to retrieve digital/audio data.\nOnce we entered the smartphone era, DSPs started encompassing more sophisticated tasks. They required Bluetooth, Wi-Fi, and cellular connectivity. Media also became much more complex. Today, it’s rare to have entire chips dedicated to just DSP, but a System on Chip would include DSPs and general-purpose CPUs. For example, Qualcomm’s Hexagon Digital Signal Processor claims to be a “world-class processor with both CPU and DSP functionality to support deeply embedded processing needs of the mobile platform for both multimedia and modem functions.” Google Tensors, the chip in the Google Pixel phones, also includes CPUs and specialized DSP engines.\n\nAdvantages\nDSPs architecturally provide advantages in vector math throughput, low latency memory access, power efficiency, and support for diverse datatypes - making them well-suited for embedded ML acceleration.\n\nOptimized Architecture for Vector Math\nDSPs contain specialized data paths, register files, and instructions optimized specifically for vector math operations commonly used in machine learning models. This includes dot product engines, MAC units, and SIMD capabilities tailored for vector/matrix calculations. For example, the CEVA-XM6 DSP (“Ceva SensPro Fuses AI and Vector DSP”) has 512-bit vector units to accelerate convolutions. This efficiency on vector math workloads is far beyond general CPUs.\n\n\nLow Latency On-Chip Memory\nDSPs integrate large amounts of fast on-chip SRAM memory to hold data locally for processing. Bringing memory physically closer to the computation units reduces access latency. For example, Analog’s SHARC+ DSP contains 10MB of on-chip SRAM. This high-bandwidth local memory provides speed advantages for real-time applications.\n\n\nPower Efficiency\nDSPs are engineered to provide high performance per watt on digital signal workloads. Efficient data paths, parallelism, and memory architectures enable trillions of math operations per second within tight mobile power budgets. For example, Qualcomm’s Hexagon DSP can deliver 4 trillion operations per second (TOPS) while consuming minimal watts.\n\n\nSupport for Integer and Floating Point Math\nUnlike GPUs that excel at single or half precision, DSPs can natively support 8/16-bit integer and 32-bit floating point datatypes used across ML models. Some DSPs support dot product acceleration at INT8 precision for quantized neural networks.\n\n\n\nDisadvantages\nDSPs make architectural tradeoffs that limit peak throughput, precision, and model capacity compared to other AI accelerators. However, their advantages in power efficiency and integer math make them a strong edge computing option. So, while DSPs provide some benefits over CPUs, they also come with limitations for machine learning workloads:\n\nLower Peak Throughput than ASICs/GPUs\nDSPs cannot match the raw computational throughput of GPUs or customized ASICs designed specifically for machine learning. For example, Qualcomm’s Cloud AI 100 ASIC delivers 480 TOPS on INT8, while their Hexagon DSP provides 10 TOPS. DSPs lack the massive parallelism of GPU SM units.\n\n\nSlower Double Precision Performance\nMost DSPs must be optimized for the higher precision floating point needed in some ML models. Their dot product engines focus on INT8/16 and FP32, which provide better power efficiency. However, 64-bit floating point throughput is much lower, which can limit usage in models requiring high precision.\n\n\nConstrained Model Capacity\nThe limited on-chip memory of DSPs constrains the model sizes that can be run. Large deep learning models with hundreds of megabytes of parameters would exceed on-chip SRAM capacity. DSPs are best suited for small to mid-sized models targeted for edge devices.\n\n\nProgramming Complexity\nEfficient programming of DSP architectures requires expertise in parallel programming and optimizing data access patterns. Their specialized microarchitectures have a steeper learning curve than high-level software frameworks, making development more complex.\n\n\n\n\n10.3.4 Graphics Processing Units (GPUs)\nThe term graphics processing unit has existed since at least the 1980s. There had always been a demand for graphics hardware in video game consoles (high demand, needed to be relatively lower cost) and scientific simulations (lower demand, but higher resolution, could be at a high price point).\nThe term was popularized, however, in 1999 when NVIDIA launched the GeForce 256, mainly targeting the PC games market sector (Lindholm et al. 2008). As PC games became more sophisticated, NVIDIA GPUs became more programmable. Soon, users realized they could take advantage of this programmability, run various non-graphics-related workloads on GPUs, and benefit from the underlying architecture. And so, in the late 2000s, GPUs became general-purpose graphics processing units or GP-GPUs.\n\nLindholm, Erik, John Nickolls, Stuart Oberman, and John Montrym. 2008. “NVIDIA Tesla: A Unified Graphics and Computing Architecture.” IEEE Micro 28 (2): 39–55. https://doi.org/10.1109/mm.2008.31.\nIntel Arc Graphics and AMD Radeon RX have also developed their GPUs over time.\n\nAdvantages\n\nHigh Computational Throughput\nThe key advantage of GPUs is their ability to perform massively parallel floating-point calculations optimized for computer graphics and linear algebra (Raina, Madhavan, and Ng 2009). Modern GPUs like Nvidia’s A100 offer up to 19.5 teraflops of FP32 performance with 6912 CUDA cores and 40GB of graphics memory tightly coupled with 1.6TB/s of graphics memory bandwidth.\n\nRaina, Rajat, Anand Madhavan, and Andrew Y. Ng. 2009. “Large-Scale Deep Unsupervised Learning Using Graphics Processors.” In Proceedings of the 26th Annual International Conference on Machine Learning, edited by Andrea Pohoreckyj Danyluk, Léon Bottou, and Michael L. Littman, 382:873–80. ACM International Conference Proceeding Series. ACM. https://doi.org/10.1145/1553374.1553486.\nThis raw throughput stems from the highly parallel streaming multiprocessor (SM) architecture tailored for data-parallel workloads (Zhihao Jia, Zaharia, and Aiken 2019). Each SM contains hundreds of scalar cores optimized for float32/64 math. With thousands of SMs on a chip, GPUs are purpose-built for matrix multiplication and vector operations used throughout neural networks.\nFor example, Nvidia’s latest H100 GPU provides 4000 TFLOPs of FP8, 2000 TFLOPs of FP16, 1000 TFLOPs of TF32, 67 TFLOPs of FP32 and 34 TFLOPs of FP64 Compute performance, which can dramatically accelerate large batch training on models like BERT, GPT-3, and other transformer architectures. The scalable parallelism of GPUs is key to speeding up computationally intensive deep learning.\n\n\nMature Software Ecosystem\nNvidia provides extensive runtime libraries like cuDNN and cuBLAS that are highly optimized for deep learning primitives. Frameworks like TensorFlow and PyTorch integrate with these libraries to enable GPU acceleration without direct programming. CUDA provides lower-level control for custom computations.\nThis ecosystem enables quick leveraging of GPUs via high-level Python without GPU programming expertise. Known workflows and abstractions provide a convenient on-ramp for scaling up deep learning experiments. The software maturity supplements the throughput advantages.\n\n\nBroad Availability\nThe economies of scale of graphics processing make GPUs broadly accessible in data centers, cloud platforms like AWS and GCP, and desktop workstations. Their availability in research environments has provided a convenient ML experimentation and innovation platform. For example, nearly every state-of-the-art deep learning result has involved GPU acceleration because of this ubiquity. The broad access supplements the software maturity to make GPUs the standard ML accelerator.\n\n\nProgrammable Architecture\nWhile not as flexible as FPGAs, GPUs provide programmability via CUDA and shader languages to customize computations. Developers can optimize data access patterns, create new ops, and tune precisions for evolving models and algorithms.\n\n\n\nDisadvantages\nWhile GPUs have become the standard accelerator for deep learning, their architecture has some key downsides.\n\nLess Efficient than Custom ASICs\nThe statement “GPUs are less efficient than ASICs” could spark intense debate within the ML/AI field and cause this book to explode.\nTypically, GPUs are perceived as less efficient than ASICs because the latter are custom-built for specific tasks and thus can operate more efficiently by design. With their general-purpose architecture, GPUs are inherently more versatile and programmable, catering to a broad spectrum of computational tasks beyond ML/AI.\nHowever, modern GPUs have evolved to include specialized hardware support for essential AI operations, such as generalized matrix multiplication (GEMM) and other matrix operations, native support for quantization, and native support for pruning, which are critical for running ML models effectively. These enhancements have significantly improved the efficiency of GPUs for AI tasks to the point where they can rival the performance of ASICs for certain applications.\nConsequently, contemporary GPUs are convergent, incorporating specialized ASIC-like capabilities within a flexible, general-purpose processing framework. This adaptability has blurred the lines between the two types of hardware. GPUs offer a strong balance of specialization and programmability that is well-suited to the dynamic needs of ML/AI research and development.\n\n\nHigh Memory Bandwidth Needs\nThe massively parallel architecture requires tremendous memory bandwidth to supply thousands of cores, as shown in Figure 1. For example, the Nvidia A100 GPU requires 1.6TB/sec to fully saturate its computer. GPUs rely on wide 384-bit memory buses to high-bandwidth GDDR6 RAM, but even the fastest GDDR6 tops out at around 1 TB/sec. This dependence on external DRAM incurs latency and power overheads.\n\n\nProgramming Complexity\nWhile tools like CUDA help, optimally mapping and partitioning ML workloads across the massively parallel GPU architecture remains challenging, achieving both high utilization and memory locality requires low-level tuning (Zhe Jia et al. 2018). Abstractions like TensorFlow can leave performance on the table.\n\nJia, Zhe, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. 2018. “Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking.” ArXiv Preprint. https://arxiv.org/abs/1804.06826.\n\n\nLimited On-Chip Memory\nGPUs have relatively small on-chip memory caches compared to ML models’ large working set requirements during training. They rely on high bandwidth access to external DRAM, which ASICs minimize with large on-chip SRAM.\n\n\nFixed Architecture\nUnlike FPGAs, the fundamental GPU architecture cannot be altered post-manufacture. This constraint limits adapting to novel ML workloads or layers. The CPU-GPU boundary also creates data movement overheads.\n\n\n\nCase Study\nThe recent groundbreaking research conducted by OpenAI (Brown et al. 2020) with their GPT-3 model. GPT-3, a language model with 175 billion parameters, demonstrated unprecedented language understanding and generation capabilities. Its training, which would have taken months on conventional CPUs, was accomplished in a matter of days using powerful GPUs, thus pushing the boundaries of natural language processing (NLP) capabilities.\n\n\n\n10.3.5 Central Processing Units (CPUs)\nThe term CPUs has a long history that dates back to 1955 (Weik 1955) while the first microprocessor CPU-the Intel 4004-was invented in 1971 (Who Invented the Microprocessor?). Compilers compile high-level programming languages like Python, Java, or C to assemble instructions (x86, ARM, RISC-V, etc.) for CPUs to process. The set of instructions a CPU understands is called the “instruction set.” It must be agreed upon by both the hardware and software running atop it (See section 5 for a more in-depth description of instruction set architectures-ISAs).\n\nWeik, Martin H. 1955. A Survey of Domestic Electronic Digital Computing Systems. Ballistic Research Laboratories.\nAn overview of significant developments in CPUs:\n\n** Single-core Era (1950s- 2000): ** This era is known for aggressive microarchitectural improvements. Techniques like speculative execution (executing an instruction before the previous one was done), out-of-order execution (re-ordering instructions to be more effective), and wider issue widths (executing multiple instructions at once) were implemented to increase instruction throughput. The term “System on Chip” also originated in this era as different analog components (components designed with transistors) and digital components (components designed with hardware description languages that are mapped to transistors) were put on the same platform to achieve some task.\nMulticore Era (2000s): Driven by the decrease of Moore’s Law, this era is marked by scaling the number of cores within a CPU. Now, tasks can be split across many different cores, each with its own datapath and control unit. Many of the issues in this era pertained to how to share certain resources, which resources to share, and how to maintain coherency and consistency across all the cores.\nSea of accelerators (2010s): Again, driven by the decrease of Moore’s law, this era is marked by offloading more complicated tasks to accelerators (widgets) attached to the main datapath in CPUs. It’s common to see accelerators dedicated to various AI workloads, as well as image/digital processing, and cryptography. In these designs, CPUs are often described more as judges, deciding which tasks should be processed rather than doing the processing itself. Any task could still be run on the CPU rather than the accelerators, but the CPU would generally be slower. However, the cost of designing and programming the accelerator became a non-trivial hurdle that sparked interest in design-specific libraries (DSLs).\nPresence in data centers: Although we often hear that GPUs dominate the data center marker, CPUs are still well suited for tasks that don’t inherently possess a large amount of parallelism. CPUs often handle serial and small tasks and coordinate the data center.\nOn the edge: Given the tighter resource constraints on the edge, edge CPUs often only implement a subset of the techniques developed in the sing-core era because these optimizations tend to be heavy on power and area consumption. Edge CPUs still maintain a relatively simple datapath with limited memory capacities.\n\nTraditionally, CPUs have been synonymous with general-purpose computing, a term that has also changed as the “average” workload a consumer would run changes over time. For example, floating point components were once considered reserved for “scientific computing,” they were usually implemented as a co-processor (a modular component that worked with the datapath) and seldom deployed to average consumers. Compare this attitude to today, where FPUs are built into every datapath.\n\nAdvantages\nWhile raw throughput is limited, general-purpose CPUs provide practical AI acceleration benefits.\n\nGeneral Programmability\nCPUs support diverse workloads beyond ML, providing flexible general-purpose programmability. This versatility comes from their standardized instruction sets and mature compiler ecosystems, which allow running any application, from databases and web servers to analytics pipelines (Hennessy and Patterson 2019).\n\nHennessy, John L., and David A. Patterson. 2019. “A New Golden Age for Computer Architecture.” Commun. ACM 62 (2): 48–60. https://doi.org/10.1145/3282307.\nThis avoids the need for dedicated ML accelerators and enables leveraging existing CPU-based infrastructure for basic ML deployment. For example, X86 servers from vendors like Intel and AMD can run common ML frameworks using Python and TensorFlow packages alongside other enterprise workloads.\n\n\nMature Software Ecosystem\nFor decades, highly optimized math libraries like BLAS, LAPACK, and FFTW have leveraged vectorized instructions and multithreading on CPUs (Dongarra 2009). Major ML frameworks like PyTorch, TensorFlow, and SciKit-Learn are designed to integrate seamlessly with these CPU math kernels.\n\nDongarra, Jack J. 2009. “The Evolution of High Performance Computing on System z.” IBM J. Res. Dev. 53: 3–4.\nHardware vendors like Intel and AMD also provide low-level libraries to optimize performance for deep learning primitives fully (AI Inference Acceleration on CPUs). This robust, mature software ecosystem allows quickly deploying ML on existing CPU infrastructure.\n\n\nWide Availability\nThe economies of scale of CPU manufacturing, driven by demand across many markets like PCs, servers, and mobile, make them ubiquitously available. Intel CPUs, for example, have powered most servers for decades (Ranganathan 2011). This wide availability in data centers reduces hardware costs for basic ML deployment.\n\nRanganathan, Parthasarathy. 2011. “From Microprocessors to Nanostores: Rethinking Data-Centric Systems.” Computer 44 (1): 39–48. https://doi.org/10.1109/mc.2011.18.\nEven small embedded devices typically integrate some CPU, enabling edge inference. The ubiquity reduces the need to purchase specialized ML accelerators in many situations.\n\n\nLow Power for Inference\nOptimizations like ARM Neon and Intel AVX vector extensions provide power-efficient integer and floating point throughput optimized for “bursty” workloads such as inference (Ignatov et al. 2018). While slower than GPUs, CPU inference can be deployed in power-constrained environments. For example, ARM’s Cortex-M CPUs now deliver over 1 TOPS of INT8 performance under 1W, enabling keyword spotting and vision applications on edge devices (ARM).\n\n\n\nDisadvantages\nWhile providing some advantages, general-purpose CPUs also have limitations for AI workloads.\n\nLower Throughput than Accelerators\nCPUs lack the specialized architectures for massively parallel processing that GPUs and other accelerators provide. Their general-purpose design reduces computational throughput for the highly parallelizable math operations common in ML models (N. P. Jouppi et al. 2017a).\n\nJouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et al. 2017a. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” In Proceedings of the 44th Annual International Symposium on Computer Architecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\n\nNot Optimized for Data Parallelism\nThe architectures of CPUs are not specifically optimized for data parallel workloads inherent to AI (Sze et al. 2017). They allocate substantial silicon area to instruction decoding, speculative execution, caching, and flow control that provides little benefit for the array operations used in neural networks (AI Inference Acceleration on CPUs). However, modern CPUs are equipped with vector instructions like AVX-512 specifically to accelerate certain key operations like matrix multiplication.\nGPU streaming multiprocessors, for example, devote most transistors to floating point units instead of complex branch prediction logic. This specialization allows much higher utilization for ML math.\n\n\nHigher Memory Latency\nCPUs suffer from higher latency accessing main memory relative to GPUs and other accelerators (DDR). Techniques like tiling and caching can help, but the physical separation from off-chip RAM bottlenecks data-intensive ML workloads. This emphasizes the need for specialized memory architectures in ML hardware.\n\n\nPower Inefficiency Under Heavy Workloads\nWhile suitable for intermittent inference, sustaining near-peak throughput for training results in inefficient power consumption on CPUs, especially mobile CPUs (Ignatov et al. 2018). Accelerators explicitly optimize the data flow, memory, and computation for sustained ML workloads. CPUs are energy-inefficient for training large models.\n\n\n\n\n10.3.6 Comparison\n\n\n\n\n\n\n\n\n\nAccelerator\nDescription\nKey Advantages\nKey Disadvantages\n\n\n\n\nASICs\nCustom ICs designed for target workloads like AI inference\nMaximizes perf/watt. Optimized for tensor ops Low latency on-chip memory\nFixed architecture lacks flexibility High NRE cost Long design cycles\n\n\nFPGAs\nReconfigurable fabric with programmable logic and routing\nFlexible architecture Low latency memory access\nLower perf/watt than ASICs Complex programming\n\n\nGPUs\nOriginally for graphics, now used for neural network acceleration\nHigh throughput Parallel scalability Software ecosystem with CUDA\nNot as power efficient as ASICs. Require high memory bandwidth\n\n\nCPUs\nGeneral purpose processors\nProgrammability Ubiquitous availability\nLower performance for AI workloads\n\n\n\nIn general, CPUs provide a readily available baseline, GPUs deliver broadly accessible acceleration, FPGAs offer programmability, and ASICs maximize efficiency for fixed functions. The optimal choice depends on the target application’s scale, cost, flexibility, and other requirements.\nAlthough first developed for data center deployment, where [cite some benefit that Google cites], Google has also put considerable effort into developing Edge TPUs. These Edge TPUs maintain the inspiration from systolic arrays but are tailored to the limited resources accessible at the edge.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1045,7 +1045,7 @@ "section": "10.4 Hardware-Software Co-Design", "text": "10.4 Hardware-Software Co-Design\nHardware-software co-design is based on the principle that AI systems achieve optimal performance and efficiency when the hardware and software components are designed in tight integration. This involves an iterative, collaborative design cycle where the hardware architecture and software algorithms are concurrently developed and refined with continuous feedback between teams.\nFor example, a new neural network model may be prototyped on an FPGA-based accelerator platform to obtain real performance data early in the design process. These results provide feedback to the hardware designers on potential optimizations and the software developers on refinements to the model or framework to better leverage the hardware capabilities. This level of synergy is difficult to achieve with the common practice of software being developed independently to deploy on fixed commodity hardware.\nCo-design is critical for embedded AI systems facing significant resource constraints like low power budgets, limited memory and compute capacity, and real-time latency requirements. Tight integration between algorithm developers and hardware architects helps unlock optimizations across the stack to meet these restrictions. Enabling techniques include algorithmic improvements like neural architecture search and pruning and hardware advances like specialized dataflows and memory hierarchies.\nBy bringing hardware and software design together, rather than developing them separately, holistic optimizations can be made that maximize performance and efficiency. The next sections provide more details on specific co-design approaches.\n\n10.4.1 The Need for Co-Design\nSeveral key factors make a collaborative hardware-software co-design approach essential for building efficient AI systems.\n\nIncreasing Model Size and Complexity\nState-of-the-art AI models have been rapidly growing in size, enabled by advances in neural architecture design and the availability of large datasets. For example, the GPT-3 language model contains 175 billion parameters (Brown et al. 2020), requiring huge computational resources for training. This explosion in model complexity necessitates co-design to develop efficient hardware and algorithms in tandem. Techniques like model compression (Cheng et al. 2018) and quantization must be co-optimized with the hardware architecture.\n\nBrown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.\n\nCheng, Yu, Duo Wang, Pan Zhou, and Tao Zhang. 2018. “Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges.” IEEE Signal Process Mag. 35 (1): 126–36. https://doi.org/10.1109/msp.2017.2765695.\n\n\nConstraints of Embedded Deployment\nDeploying AI applications on edge devices like mobile phones or smart home appliances introduces significant constraints on energy, memory, and silicon area (Sze et al. 2017). Enable real-time inference under these restrictions requires co-exploring hardware optimizations like specialized dataflows and compression with efficient neural network design and pruning techniques. Co-design maximizes performance within tight deployment constraints.\n\n\nRapid Evolution of AI Algorithms\nAI is rapidly evolving, with new model architectures, training methodologies, and software frameworks constantly emerging. For example, Transformers have recently become hugely popular for NLP (Young et al. 2018). Keeping pace with these algorithmic innovations requires hardware-software co-design to adapt platforms and avoid accrued technical debt quickly.\n\nYoung, Tom, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. “Recent Trends in Deep Learning Based Natural Language Processing [Review Article].” IEEE Comput. Intell. Mag. 13 (3): 55–75. https://doi.org/10.1109/mci.2018.2840738.\n\n\nComplex Hardware-Software Interactions\nMany subtle interactions and tradeoffs between hardware architectural choices and software optimizations significantly impact overall efficiency. For instance, techniques like tensor partitioning and batching affect parallelism and data access patterns impact memory utilization. Co-design provides a cross-layer perspective to unravel these dependencies.\n\n\nNeed for Specialization\nAI workloads benefit from specialized operations like low-precision math and customized memory hierarchies. This motivates incorporating custom hardware tailored to neural network algorithms rather than relying solely on flexible software running on generic hardware (Sze et al. 2017). However, the software stack must explicitly target custom hardware operations to realize the benefits.\n\n\nDemand for Higher Efficiency\nWith growing model complexity, diminishing returns and overhead from optimizing only the hardware or software in isolation (Putnam et al. 2014) arise. Inevitable tradeoffs arise that require global optimization across layers. Jointly co-designing hardware and software provides large compound efficiency gains.\n\nPutnam, Andrew, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, et al. 2014. “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.” ACM SIGARCH Computer Architecture News 42 (3): 13–24. https://doi.org/10.1145/2678373.2665678.\n\n\n\n10.4.2 Principles of Hardware-Software Co-Design\nThe underlying hardware architecture and software stack must be tightly integrated and co-optimized to build high-performance and efficient AI systems. Neither can be designed in isolation; maximizing their synergies requires a holistic approach known as hardware-software co-design.\nThe key goal is tailoring the hardware capabilities to match the algorithms and workloads run by the software. This requires a feedback loop between hardware architects and software developers to converge on optimized solutions. Several techniques enable effective co-design:\n\nHardware-Aware Software Optimization\nThe software stack can be optimized to leverage the underlying hardware capabilities better:\n\nParallelism: Parallelize matrix computations like convolution or attention layers to maximize throughput on vector engines.\nMemory Optimization: Tune data layouts to improve cache locality based on hardware profiling. This maximizes reuse and minimizes expensive DRAM access.\nCompression: Use sparsity in the models to reduce storage space and save on computation by zero-skipping operations.\nCustom Operations: Incorporate specialized operations like low-precision INT4 or bfloat16 into models to capitalize on dedicated hardware support.\nDataflow Mapping: Explicitly map model stages to computational units to optimize data movement on hardware.\n\n\n\nAlgorithm-Driven Hardware Specialization\nHardware can be tailored to suit the characteristics of ML algorithms better:\n\nCustom Datatypes: Support low precision INT8/4 or bfloat16 in hardware for higher arithmetic density.\nOn-Chip Memory: Increase SRAM bandwidth and lower access latency to match model memory access patterns.\nDomain-Specific Ops: Add hardware units for key ML functions like FFTs or matrix multiplication to reduce latency and energy.\nModel Profiling: Use model simulation and profiling to identify computational hotspots and optimize hardware.\n\nThe key is collaborative feedback - insights from hardware profiling guide software optimizations, while algorithmic advances inform hardware specialization. This mutual enhancement provides multiplicative efficiency gains compared to isolated efforts.\n\n\nAlgorithm-Hardware Co-exploration\nA powerful co-design technique involves jointly exploring innovations in neural network architectures and custom hardware design. This allows for finding ideal pairings tailored to each other’s strengths (Sze et al. 2017).\n\nSze, Vivienne, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. “Efficient Processing of Deep Neural Networks: A Tutorial and Survey.” Proc. IEEE 105 (12): 2295–2329. https://doi.org/10.1109/jproc.2017.2761740.\n\nHoward, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” ArXiv Preprint. https://arxiv.org/abs/1704.04861.\n\nJacob, Benoit, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2704–13. IEEE. https://doi.org/10.1109/cvpr.2018.00286.\n\nGale, Trevor, Erich Elsen, and Sara Hooker. 2019. “The State of Sparsity in Deep Neural Networks.” ArXiv Preprint abs/1902.09574. https://arxiv.org/abs/1902.09574.\n\nMishra, Asit K., Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. 2021. “Accelerating Sparse Deep Neural Networks.” CoRR abs/2104.08378. https://arxiv.org/abs/2104.08378.\nFor instance, the shift to mobile architectures like MobileNets (Howard et al. 2017) was guided by edge device constraints like model size and latency. The quantization (Jacob et al. 2018) and pruning techniques (Gale, Elsen, and Hooker 2019) that unlocked these efficient models became possible thanks to hardware accelerators with native low-precision integer support and pruning support (Mishra et al. 2021).\nAttention-based models have thrived on massively parallel GPUs and ASICs, where their computation maps well spatially, as opposed to RNN architectures, which rely on sequential processing. The co-evolution of algorithms and hardware unlocked new capabilities.\nEffective co-exploration requires close collaboration between algorithm researchers and hardware architects. Rapid prototyping on FPGAs (C. Zhang et al. 2015) or specialized AI simulators allows quick evaluation of different pairings of model architectures and hardware designs pre-silicon.\n\nZhang, Chen, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Optimizing Cong. 2015. “FPGA-Based Accelerator Design for Deep Convolutional Neural Networks Proceedings of the 2015 ACM.” In SIGDA International Symposium on Field-Programmable Gate Arrays-FPGA, 15:161–70.\nFor example, Google’s TPU architecture evolved with optimizations to TensorFlow models to maximize performance on image classification. This tight feedback loop yielded models tailored for the TPU that would have been unlikely in isolation.\nStudies have shown 2-5x higher performance and efficiency gains with algorithm-hardware co-exploration than isolated algorithm or hardware optimization efforts (Suda et al. 2016). Parallelizing the joint development also reduces time-to-deployment.\n\nSuda, Naveen, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. “Throughput-Optimized OpenCL-Based FPGA Accelerator for Large-Scale Convolutional Neural Networks.” In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 16–25. ACM. https://doi.org/10.1145/2847263.2847276.\nOverall, exploring the tight interdependencies between model innovation and hardware advances unlocks opportunities that must be visible when tackled sequentially. This synergistic co-design yields solutions greater than the sum of their parts.\n\n\n\n10.4.3 Challenges\nWhile collaborative co-design can improve efficiency, adaptability, and time to market, it also has engineering and organizational challenges.\n\nIncreased Prototyping Costs\nMore extensive prototyping is required to evaluate different hardware-software pairings. The need for rapid, iterative prototypes on FPGAs or emulators increases validation overhead. For example, Microsoft found that more prototypes were needed to co-design an AI accelerator than sequential design (Fowers et al. 2018).\n\nFowers, Jeremy, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, et al. 2018. “A Configurable Cloud-Scale DNN Processor for Real-Time AI.” In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 1–14. IEEE; IEEE. https://doi.org/10.1109/isca.2018.00012.\n\n\nTeam and Organizational Hurdles\nCo-design requires close coordination between traditionally disconnected hardware and software groups. This could introduce communication issues or misaligned priorities and schedules. Navigating different engineering workflows is also challenging. Some organizational inertia to adopting integrated practices may exist.\n\n\nSimulation and Modeling Complexity\nCapturing subtle interactions between hardware and software layers for joint simulation and modeling adds significant complexity. Full cross-layer abstractions are difficult to construct quantitatively before implementation, making holistic optimizations harder to quantify ahead of time.\n\n\nOver-Specialization Risks\nTight co-design bears the risk of overfitting optimizations to current algorithms, sacrificing generality. For example, hardware tuned exclusively for Transformer models could underperform on future techniques. Maintaining flexibility requires foresight.\n\n\nAdoption Challenges\nEngineers comfortable with established discrete hardware or software design practices may only accept familiar collaborative workflows. Despite the long-term benefits, projects could face friction in transitioning to co-design.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1056,7 +1056,7 @@ "section": "10.5 Software for AI Hardware", "text": "10.5 Software for AI Hardware\nSpecialized hardware accelerators like GPUs, TPUs, and FPGAs are essential to delivering high-performance artificial intelligence applications. However, an extensive software stack is required to leverage these hardware platforms effectively, spanning the entire development and deployment lifecycle. Frameworks and libraries form the backbone of AI hardware, offering sets of robust, pre-built code, algorithms, and functions specifically optimized to perform various AI tasks on different hardware. They are designed to simplify the complexities of utilizing the hardware from scratch, which can be time-consuming and prone to error. Software plays an important role in the following:\n\nProviding programming abstractions and models like CUDA and OpenCL to map computations onto accelerators.\nIntegrating accelerators into popular deep learning frameworks like TensorFlow and PyTorch.\nCompilers and tools to optimize across the hardware-software stack.\nSimulation platforms to model hardware and software together.\nInfrastructure to manage deployment on accelerators.\n\nThis expansive software ecosystem is as important as the hardware in delivering performant and efficient AI applications. This section overviews the tools available at each stack layer to enable developers to build and run AI systems powered by hardware acceleration.\n\n10.5.1 Programming Models\nProgramming models provide abstractions to map computations and data onto heterogeneous hardware accelerators:\n\nCUDA: Nvidia’s parallel programming model to leverage GPUs using extensions to languages like C/C++. Allows launching kernels across GPU cores (Luebke 2008).\nOpenCL: Open standard for writing programs spanning CPUs, GPUs, FPGAs, and other accelerators. Specifies a heterogeneous computing framework (Munshi 2009).\nOpenGL/WebGL: 3D graphics programming interfaces that can map general-purpose code to GPU cores (Segal and Akeley 1999).\nVerilog/VHDL: Hardware description languages (HDLs) used to configure FPGAs as AI accelerators by specifying digital circuits (Gannot and Ligthart 1994).\nTVM: A Compiler framework providing a Python frontend to optimize and map deep learning models onto diverse hardware backends (Chen et al. 2018).\n\n\nLuebke, David. 2008. “CUDA: Scalable Parallel Programming for High-Performance Scientific Computing.” In 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 836–38. IEEE. https://doi.org/10.1109/isbi.2008.4541126.\n\nMunshi, Aaftab. 2009. “The OpenCL Specification.” In 2009 IEEE Hot Chips 21 Symposium (HCS), 1–314. IEEE. https://doi.org/10.1109/hotchips.2009.7478342.\n\nSegal, Mark, and Kurt Akeley. 1999. “The OpenGL Graphics System: A Specification (Version 1.1).”\n\nGannot, G., and M. Ligthart. 1994. “Verilog HDL Based FPGA Design.” In International Verilog HDL Conference, 86–92. IEEE. https://doi.org/10.1109/ivc.1994.323743.\n\nChen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, et al. 2018. “TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.” In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 578–94.\nKey challenges include expressing parallelism, managing memory across devices, and matching algorithms to hardware capabilities. Abstractions must balance portability with allowing hardware customization. Programming models enable developers to harness accelerators without hardware expertise. These details are discussed in the AI frameworks section.\n\n\n\n\n\n\nExercise 10.1: Software for AI Hardware - TVM\n\n\n\n\n\nWe’ve learned that fancy AI hardware needs special software to work magic. TVM is like a super-smart translator, turning your code into instructions that accelerators understand. In this Colab, we’ll use TVM to make a pretend accelerator called VTA do matrix multiplication super fast. Ready to see how software powers up hardware?\n\n\n\n\n\n\n10.5.2 Libraries and Runtimes\nSpecialized libraries and runtimes provide software abstractions to access and maximize the utilization of AI accelerators:\n\nMath Libraries: Highly optimized implementations of linear algebra primitives like GEMM, FFTs, convolutions, etc., tailored to the target hardware. Nvidia cuBLAS, Intel MKL, and Arm compute libraries are examples.\nFramework Integrations: Libraries to accelerate deep learning frameworks like TensorFlow, PyTorch, and MXNet on supported hardware. For example, cuDNN accelerates CNNs on Nvidia GPUs.\nRuntimes: Software to handle accelerator execution, including scheduling, synchronization, memory management, and other tasks. Nvidia TensorRT is an inference optimizer and runtime.\nDrivers and Firmware: Low-level software to interface with hardware, initialize devices, and handle execution. Vendors like Xilinx provide drivers for their accelerator boards.\n\nFor instance, PyTorch integrators use cuDNN and cuBLAS libraries to accelerate training on Nvidia GPUs. The TensorFlow XLA runtime optimizes and compiles models for accelerators like TPUs. Drivers initialize devices and offload operations.\nThe challenges include efficiently partitioning and scheduling workloads across heterogeneous devices like multi-GPU nodes. Runtimes must also minimize the overhead of data transfers and synchronization.\nLibraries, runtimes, and drivers provide optimized building blocks that deep learning developers can leverage to tap into accelerator performance without hardware programming expertise. Their optimization is essential for production deployments.\n\n\n10.5.3 Optimizing Compilers\nOptimizing compilers is key in extracting maximum performance and efficiency from hardware accelerators for AI workloads. They apply optimizations spanning algorithmic changes, graph-level transformations, and low-level code generation.\n\nAlgorithm Optimization: Techniques like quantization, pruning, and neural architecture search to enhance model efficiency and match hardware capabilities.\nGraph Optimizations: Graph-level optimizations like operator fusion, rewriting, and layout transformations to optimize performance on target hardware.\nCode Generation: Generating optimized low-level code for accelerators from high-level models and frameworks.\n\nFor example, the TVM open compiler stack applies quantization for a BERT model targeting Arm GPUs. It fuses pointwise convolution operations and transforms the weight layout to optimize memory access. Finally, it emits optimized OpenGL code to run the GPU workload.\nKey compiler optimizations include maximizing parallelism, improving data locality and reuse, minimizing memory footprint, and exploiting custom hardware operations. Compilers build and optimize machine learning workloads holistically across hardware components like CPUs, GPUs, and other accelerators.\nHowever, efficiently mapping complex models introduces challenges like efficiently partitioning workloads across heterogeneous devices. Production-level compilers also require extensive time tuning on representative workloads. Still, optimizing compilers is indispensable in unlocking the full capabilities of AI accelerators.\n\n\n10.5.4 Simulation and Modeling\nSimulation software is important in hardware-software co-design. It enables joint modeling of proposed hardware architectures and software stacks:\n\nHardware Simulation: Platforms like Gem5 allow detailed simulation of hardware components like pipelines, caches, interconnects, and memory hierarchies. Engineers can model hardware changes without physical prototyping (Binkert et al. 2011).\nSoftware Simulation: Compiler stacks like TVM support the simulation of machine learning workloads to estimate performance on target hardware architectures. This assists with software optimizations.\nCo-simulation: Unified platforms like the SCALE-Sim (Samajdar et al. 2018) integrate hardware and software simulation into a single tool. This enables what-if analysis to quantify the system-level impacts of cross-layer optimizations early in the design cycle.\n\n\nBinkert, Nathan, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. “The Gem5 Simulator.” ACM SIGARCH Computer Architecture News 39 (2): 1–7. https://doi.org/10.1145/2024716.2024718.\n\nSamajdar, Ananda, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. “Scale-Sim: Systolic Cnn Accelerator Simulator.” ArXiv Preprint abs/1811.02883. https://arxiv.org/abs/1811.02883.\nFor example, an FPGA-based AI accelerator design could be simulated using Verilog hardware description language and synthesized into a Gem5 model. Verilog is well-suited for describing the digital logic and interconnects of the accelerator architecture. Verilog allows the designer to specify the datapaths, control logic, on-chip memories, and other components implemented in the FPGA fabric. Once the Verilog design is complete, it can be synthesized into a model that simulates the behavior of the hardware, such as using the Gem5 simulator. Gem5 is useful for this task because it allows the modeling of full systems, including processors, caches, buses, and custom accelerators. Gem5 supports interfacing Verilog models of hardware to the simulation, enabling unified system modeling.\nThe synthesized FPGA accelerator model could then have ML workloads simulated using TVM compiled onto it within the Gem5 environment for unified modeling. TVM allows optimized compilation of ML models onto heterogeneous hardware like FPGAs. Running TVM-compiled workloads on the accelerator within the Gem5 simulation provides an integrated way to validate and refine the hardware design, software stack, and system integration before physically realizing the accelerator on a real FPGA.\nThis type of co-simulation provides estimations of overall metrics like throughput, latency, and power to guide co-design before expensive physical prototyping. They also assist with partitioning optimizations between hardware and software to guide design tradeoffs.\nHowever, accuracy in modeling subtle low-level interactions between components is limited. Quantified simulations are estimates but cannot wholly replace physical prototypes and testing. Still, unified simulation and modeling provide invaluable early insights into system-level optimization opportunities during the co-design process.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1067,7 +1067,7 @@ "section": "10.6 Benchmarking AI Hardware", "text": "10.6 Benchmarking AI Hardware\nBenchmarking is a critical process that quantifies and compares the performance of various hardware platforms designed to speed up artificial intelligence applications. It guides purchasing decisions, development focus, and performance optimization efforts for hardware manufacturers and software developers.\nThe benchmarking chapter explores this topic in great detail, explaining why it has become an indispensable part of the AI hardware development cycle and how it impacts the broader technology landscape. Here, we will briefly review the main concepts, but we recommend that you refer to the chapter for more details.\nBenchmarking suites such as MLPerf, Fathom, and AI Benchmark offer a set of standardized tests that can be used across different hardware platforms. These suites measure AI accelerator performance across various neural networks and machine learning tasks, from basic image classification to complex language processing. Providing a common ground for Comparison, they help ensure that performance claims are consistent and verifiable. These “tools” are applied not only to guide the development of hardware but also to ensure that the software stack leverages the full potential of the underlying architecture.\n\nMLPerf: Includes a broad set of benchmarks covering both training (Mattson et al. 2020) and inference (Reddi et al. 2020) for a range of machine learning tasks.\nFathom: Focuses on core operations in deep learning models, emphasizing their execution on different architectures (Adolf et al. 2016).\nAI Benchmark: Targets mobile and consumer devices, assessing AI performance in end-user applications (Ignatov et al. 2018).\n\n\nMattson, Peter, Vijay Janapa Reddi, Christine Cheng, Cody Coleman, Greg Diamos, David Kanter, Paulius Micikevicius, et al. 2020. “MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.” IEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\nReddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, et al. 2020. “MLPerf Inference Benchmark.” In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 446–59. IEEE; IEEE. https://doi.org/10.1109/isca45697.2020.00045.\n\nAdolf, Robert, Saketh Rama, Brandon Reagen, Gu-yeon Wei, and David Brooks. 2016. “Fathom: Reference Workloads for Modern Deep Learning Methods.” In 2016 IEEE International Symposium on Workload Characterization (IISWC), 1–10. IEEE; IEEE. https://doi.org/10.1109/iiswc.2016.7581275.\n\nIgnatov, Andrey, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. “AI Benchmark: Running Deep Neural Networks on Android Smartphones,” 0–0.\nBenchmarks also have performance metrics that are the quantifiable measures used to evaluate the effectiveness of AI accelerators. These metrics provide a comprehensive view of an accelerator’s capabilities and are used to guide the design and selection process for AI systems. Common metrics include:\n\nThroughput: Usually measured in operations per second, this metric indicates the volume of computations an accelerator can handle.\nLatency: The time delay from input to output in a system is vital for real-time processing tasks.\nEnergy Efficiency: Calculated as computations per watt, representing the tradeoff between performance and power consumption.\nCost Efficiency: This evaluates the cost of operation relative to performance, an essential metric for budget-conscious deployments.\nAccuracy: In inference tasks, the precision of computations is critical and sometimes balanced against speed.\nScalability: The ability of the system to maintain performance gains as the computational load scales up.\n\nBenchmark results give insights beyond just numbers—they can reveal bottlenecks in the software and hardware stack. For example, benchmarks may show how increased batch size improves GPU utilization by providing more parallelism or how compiler optimizations boost TPU performance. These learnings enable continuous optimization (Zhihao Jia, Zaharia, and Aiken 2019).\n\nJia, Zhihao, Matei Zaharia, and Alex Aiken. 2019. “Beyond Data and Model Parallelism for Deep Neural Networks.” In Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019, edited by Ameet Talwalkar, Virginia Smith, and Matei Zaharia. mlsys.org. https://proceedings.mlsys.org/book/265.pdf.\n\nZhu, Hongyu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Anand Jayarajan, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko. 2018. “Benchmarking and Analyzing Deep Neural Network Training.” In 2018 IEEE International Symposium on Workload Characterization (IISWC), 88–100. IEEE; IEEE. https://doi.org/10.1109/iiswc.2018.8573476.\nStandardized benchmarking provides a quantified, comparable evaluation of AI accelerators to inform design, purchasing, and optimization. However, real-world performance validation remains essential as well (H. Zhu et al. 2018).", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1078,7 +1078,7 @@ "section": "10.7 Challenges and Solutions", "text": "10.7 Challenges and Solutions\nAI accelerators offer impressive performance improvements, but significant portability and compatibility challenges often need to be improved in their integration into the broader AI landscape. The crux of the issue lies in the diversity of the AI ecosystem—a vast array of machine learning accelerators, frameworks, and programming languages exist, each with its unique features and requirements.\n\n10.7.1 Portability/Compatibility Issues\nDevelopers frequently encounter difficulties transferring their AI models from one hardware environment to another. For example, a machine learning model developed for a desktop environment in Python using the PyTorch framework, optimized for an Nvidia GPU, may not easily transition to a more constrained device such as the Arduino Nano 33 BLE. This complexity stems from stark differences in programming requirements - Python and PyTorch on the desktop versus a C++ environment on an Arduino, not to mention the shift from x86 architecture to ARM ISA.\nThese divergences highlight the intricacy of portability within AI systems. Moreover, the rapid advancement in AI algorithms and models means that hardware accelerators must continually adapt, creating a moving target for compatibility. The absence of universal standards and interfaces compounds the issue, making deploying AI solutions consistently across various devices and platforms challenging.\n\nSolutions and Strategies\nTo address these hurdles, the AI industry is moving towards several solutions:\n\nStandardization Initiatives\nThe Open Neural Network Exchange (ONNX) is at the forefront of this pursuit, proposing an open and shared ecosystem that promotes model interchangeability. ONNX facilitates the use of AI models across various frameworks, allowing models trained in one environment to be efficiently deployed in another, significantly reducing the need for time-consuming rewrites or adjustments.\n\n\nCross-Platform Frameworks\nComplementing the standardization efforts, cross-platform frameworks such as TensorFlow Lite and PyTorch Mobile have been developed specifically to create cohesion between diverse computational environments ranging from desktops to mobile and embedded devices. These frameworks offer streamlined, lightweight versions of their parent frameworks, ensuring compatibility and functional integrity across different hardware types without sacrificing performance. This ensures that developers can create applications with the confidence that they will work on many devices, bridging a gap that has traditionally posed a considerable challenge in AI development.\n\n\nHardware-agnostic Platforms\nThe rise of hardware-agnostic platforms has also played an important role in democratizing the use of AI. By creating environments where AI applications can be executed on various accelerators, these platforms remove the burden of hardware-specific coding from developers. This abstraction simplifies the development process and opens up new possibilities for innovation and application deployment, free from the constraints of hardware specifications.\n\n\nAdvanced Compilation Tools\nIn addition, the advent of advanced compilation tools like TVM, an end-to-end tensor compiler, offers an optimized path through the jungle of diverse hardware architectures. TVM equips developers with the means to fine-tune machine learning models for a broad spectrum of computational substrates, ensuring optimal performance and avoiding manual model adjustment each time there is a shift in the underlying hardware.\n\n\nCommunity and Industry Collaboration\nThe collaboration between open-source communities and industry consortia cannot be understated. These collective bodies are instrumental in forming shared standards and best practices that all developers and manufacturers can adhere to. Such collaboration fosters a more unified and synergistic AI ecosystem, significantly diminishing the prevalence of portability issues and smoothing the path toward global AI integration and advancement. Through these combined efforts, AI is steadily moving toward a future where seamless model deployment across various platforms becomes a standard rather than an exception.\nSolving the portability challenges is crucial for the AI field to realize the full potential of hardware accelerators in a dynamic and diverse technological landscape. It requires a concerted effort from hardware manufacturers, software developers, and standard bodies to create a more interoperable and flexible environment. With continued innovation and collaboration, the AI community can pave the way for seamless integration and deployment of AI models across many platforms.\n\n\n\n\n10.7.2 Power Consumption Concerns\nPower consumption is a crucial issue in the development and operation of data center AI accelerators, like Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) (N. P. Jouppi et al. 2017b) (Norrie et al. 2021) (N. Jouppi et al. 2023). These powerful components are the backbone of contemporary AI infrastructure, but their high energy demands contribute to the environmental impact of technology and drive up operational costs significantly. As data processing needs become more complex, with the popularity of AI and deep learning increasing, there’s a pressing demand for GPUs and TPUs that can deliver the necessary computational power more efficiently. The impact of such advancements is two-fold: they can lower these technologies’ environmental footprint and reduce the cost of running AI applications.\n\n———, et al. 2017b. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” In Proceedings of the 44th Annual International Symposium on Computer Architecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\nNorrie, Thomas, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman Jouppi, and David Patterson. 2021. “The Design Process for Google’s Training Chips: Tpuv2 and TPUv3.” IEEE Micro 41 (2): 56–63. https://doi.org/10.1109/mm.2021.3058217.\n\nJouppi, Norm, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, et al. 2023. “TPU V4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.” In Proceedings of the 50th Annual International Symposium on Computer Architecture. ISCA ’23. New York, NY, USA: ACM. https://doi.org/10.1145/3579371.3589350.\nEmerging hardware technologies are at the cusp of revolutionizing power efficiency in this sector. Photonic computing, for instance, uses light rather than electricity to carry information, offering a promise of high-speed processing with a fraction of the power usage. We delve deeper into this and other innovative technologies in the “Emerging Hardware Technologies” section, exploring their potential to address current power consumption challenges.\nAt the edge of the network, AI accelerators are engineered to process data on devices like smartphones, IoT sensors, and smart wearables. These devices often work under severe power limitations, necessitating a careful balancing act between performance and power usage. A high-performance AI model may provide quick results but at the cost of depleting battery life swiftly and increasing thermal output, which may affect the device’s functionality and durability. The stakes are higher for devices deployed in remote or hard-to-reach areas, where consistent power supply cannot be guaranteed, underscoring the need for low-power-consuming solutions.\nLatency issues further compound the challenge of power efficiency at the edge. Edge AI applications in fields such as autonomous driving and healthcare monitoring require speed, precision, and reliability, as delays in processing can lead to serious safety risks. For these applications, developers must optimize both the AI algorithms and the hardware design to strike an optimal balance between power consumption and latency.\nThis optimization effort is not just about making incremental improvements to existing technologies; it’s about rethinking how and where we process AI tasks. By designing AI accelerators that are both power-efficient and capable of quick processing, we can ensure these devices serve their intended purposes without unnecessary energy use or compromised performance. Such developments could propel the widespread adoption of AI across various sectors, enabling smarter, safer, and more sustainable use of technology.\n\n\n10.7.3 Overcoming Resource Constraints\nResource constraints also pose a significant challenge for Edge AI accelerators, as these specialized hardware and software solutions must deliver robust performance within the limitations of edge devices. Due to power and size limitations, edge AI accelerators often have restricted computation, memory, and storage capacity (L. Zhu et al. 2023). This scarcity of resources necessitates a careful allocation of processing capabilities to execute machine learning models efficiently.\n\nZhu, Ligeng, Lanxiang Hu, Ji Lin, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. 2023. “PockEngine: Sparse and Efficient Fine-Tuning in a Pocket.” In 56th Annual IEEE/ACM International Symposium on Microarchitecture. ACM. https://doi.org/10.1145/3613424.3614307.\n\nLin, Ji, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, and Song Han. 2023. “AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.” arXiv.\n\nLi, Yuhang, Xin Dong, and Wei Wang. 2020. “Additive Powers-of-Two Quantization: An Efficient Non-Uniform Discretization for Neural Networks.” In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=BkgXT24tDS.\n\nWang, Tianzhe, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, and Song Han. 2020. “APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2075–84. IEEE. https://doi.org/10.1109/cvpr42600.2020.00215.\nMoreover, managing constrained resources demands innovative approaches, including model quantization (Lin et al. 2023) (Li, Dong, and Wang 2020), pruning (Wang et al. 2020), and optimizing inference pipelines. Edge AI accelerators must strike a delicate balance between providing meaningful AI functionality and not exhausting available resources while maintaining low power consumption. Overcoming these resource constraints is crucial to ensure the successful deployment of AI at the edge, where many applications, from IoT to mobile devices, rely on efficiently using limited hardware resources to deliver real-time and intelligent decision-making.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1087,9 +1087,9 @@ "href": "contents/hw_acceleration/hw_acceleration.html#emerging-technologies", "title": "10  AI Acceleration", "section": "10.8 Emerging Technologies", - "text": "10.8 Emerging Technologies\nThus far, we have discussed AI hardware technology in the context of conventional von Neumann architecture design and CMOS-based implementation. These specialized AI chips offer benefits like higher throughput and power efficiency but rely on traditional computing principles. The relentless growth in demand for AI computing power is driving innovations in integration methods for AI hardware.\nTwo leading approaches have emerged for maximizing compute density—wafer-scale integration and chiplet-based architectures—which we will discuss in this section. Looking much further ahead, we will examine emerging technologies that diverge from conventional architectures and adopt fundamentally different approaches for AI-specialized computing.\nSome of these unconventional paradigms include neuromorphic computing, which mimics biological neural networks; quantum computing, which leverages quantum mechanical effects; and optical computing, which utilizes photons instead of electrons. Beyond novel computing substrates, new device technologies are enabling additional gains through better memory and interconnecting.\nExamples include memristors for in-memory computing and nanophotonics for integrated photonic communication. Together, these technologies offer the potential for orders of magnitude improvements in speed, efficiency, and scalability compared to current AI hardware. We will examine these in this section.\n\n10.8.1 Integration Methods\nIntegration methods refer to the approaches used to combine and interconnect an AI chip or system’s various computational and memory components. By closely linking the key processing elements, integration aims to maximize performance, power efficiency, and density.\nIn the past, AI computing was primarily performed on CPUs and GPUs built using conventional integration methods. These discrete components were manufactured separately and connected together on a board. However, this loose integration creates bottlenecks, such as data transfer overheads.\nAs AI workloads have grown, there is increasing demand for tighter integration between computing, memory, and communication elements. Some key drivers of integration include:\n\nMinimizing data movement: Tight integration reduces latency and power for moving data between components. This improves efficiency.\nCustomization: Tailoring all system components to AI workloads allows optimizations throughout the hardware stack.\nParallelism: Integrating many processing elements enables massively parallel computation.\nDensity: Tighter integration allows more transistors and memory to be packed into a given area.\nCost: Economies of scale from large integrated systems can reduce costs.\n\nIn response, new manufacturing techniques like wafer-scale fabrication and advanced packaging now allow much higher levels of integration. The goal is to create unified, specialized AI compute complexes tailored for deep learning and other AI algorithms. Tighter integration is key to delivering the performance and efficiency needed for the next generation of AI.\n\nWafer-scale AI\nWafer-scale AI takes an extremely integrated approach, manufacturing an entire silicon wafer as one gigantic chip. This differs drastically from conventional CPUs and GPUs, which cut each wafer into many smaller individual chips. Figure 10.4 shows a comparison between Cerebras Wafer Scale Engine 2, which is the largest chip ever built, and the largest GPU. While some GPUs may contain billions of transistors, they still pale in Comparison to the scale of a wafer-size chip with over a trillion transistors.\nThe wafer-scale approach also diverges from more modular system-on-chip designs that still have discrete components communicating by bus. Instead, wafer-scale AI enables full customization and tight integration of computation, memory, and interconnects across the entire die.\n\n\n\n\n\n\nFigure 10.4: Wafer-scale vs. GPU. Credit: Cerebras.\n\n\n\nBy designing the wafer as one integrated logic unit, data transfer between elements is minimized. This provides lower latency and power consumption than discrete system-on-chip or chiplet designs. While chiplets can offer flexibility by mixing and matching components, communication between chiplets is challenging. The monolithic nature of wafer-scale integration eliminates these inter-chip communication bottlenecks.\nHowever, the ultra-large-scale also poses difficulties for manufacturability and yield with wafer-scale designs. Defects in any region of the wafer can make (certain parts of) the chip unusable. Specialized lithography techniques are required to produce such large dies. So, wafer-scale integration pursues the maximum performance gains from integration but requires overcoming substantial fabrication challenges.\nThe following video will provide additional context.\n\n\n\nChiplets for AI\nChiplet design refers to a semiconductor architecture in which a single integrated circuit (IC) is constructed from multiple smaller, individual components known as chiplets. Each chiplet is a self-contained functional block, typically specialized for a specific task or functionality. These chiplets are then interconnected on a larger substrate or package to create a cohesive system. Figure 10.5 illustrates this concept. For AI hardware, chiplets enable the mixing of different types of chips optimized for tasks like matrix multiplication, data movement, analog I/O, and specialized memories. This heterogeneous integration differs greatly from wafer-scale integration, where all logic is manufactured as one monolithic chip. Companies like Intel and AMD have adopted chiplet designs for their CPUs.\nChiplets are interconnected using advanced packaging techniques like high-density substrate interposers, 2.5D/3D stacking, and wafer-level packaging. This allows combining chiplets fabricated with different process nodes, specialized memories, and various optimized AI engines.\n\n\n\n\n\n\nFigure 10.5: Chiplet partitioning. Credit: Vivet et al. (2021).\n\n\nVivet, Pascal, Eric Guthmuller, Yvain Thonnart, Gael Pillonnet, Cesar Fuguet, Ivan Miro-Panades, Guillaume Moritz, et al. 2021. “IntAct: A 96-Core Processor with Six Chiplets 3D-Stacked on an Active Interposer with Distributed Interconnects and Integrated Power Management.” IEEE J. Solid-State Circuits 56 (1): 79–97. https://doi.org/10.1109/jssc.2020.3036341.\n\n\nSome key advantages of using chiplets for AI include:\n\nFlexibility: Flexibility: Chiplets allow for the combination of different chip types, process nodes, and memories tailored for each function. This is more modular versus a fixed wafer-scale design.\nYield: Smaller chiplets have a higher yield than a gigantic wafer-scale chip. Defects are contained in individual chiplets.\nCost: Leverages existing manufacturing capabilities versus requiring specialized new processes. Reduces costs by reusing mature fabrication.\nCompatibility: Can integrate with more conventional system architectures like PCIe and standard DDR memory interfaces.\n\nHowever, chiplets also face integration and performance challenges:\n\nLower density compared to wafer-scale, as chiplets are limited in size.\nAdded latency when communicating between chiplets versus monolithic integration. Requires optimization for low-latency interconnect.\nAdvanced packaging adds complexity versus wafer-scale integration, though this is arguable.\n\nThe key objective of chiplets is finding the right balance between modular flexibility and integration density for optimal AI performance. Chiplets aim for efficient AI acceleration while working within the constraints of conventional manufacturing techniques. Chiplets take a middle path between the extremes of wafer-scale integration and fully discrete components. This provides practical benefits but may sacrifice some computational density and efficiency versus a theoretical wafer-size system.\n\n\n\n10.8.2 Neuromorphic Computing\nNeuromorphic computing is an emerging field aiming to emulate the efficiency and robustness of biological neural systems for machine learning applications. A key difference from classical Von Neumann architectures is the merging of memory and processing in the same circuit (Schuman et al. 2022; Marković et al. 2020; Furber 2016), as illustrated in Figure 10.6. The structure of the brain inspires this integrated approach. A key advantage is the potential for orders of magnitude improvement in energy-efficient computation compared to conventional AI hardware. For example, estimates project 100x-1000x gains in energy efficiency versus current GPU-based systems for equivalent workloads.\n\nMarković, Danijela, Alice Mizrahi, Damien Querlioz, and Julie Grollier. 2020. “Physics for Neuromorphic Computing.” Nature Reviews Physics 2 (9): 499–510. https://doi.org/10.1038/s42254-020-0208-2.\n\nFurber, Steve. 2016. “Large-Scale Neuromorphic Computing Systems.” J. Neural Eng. 13 (5): 051001. https://doi.org/10.1088/1741-2560/13/5/051001.\n\n\n\n\n\n\nFigure 10.6: Comparison of the von Neumann architecture with the neuromorphic architecture. Credit: Schuman et al. (2022).\n\n\nSchuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker Mitchell, Prasanna Date, and Bill Kay. 2022. “Opportunities for Neuromorphic Computing Algorithms and Applications.” Nature Computational Science 2 (1): 10–19. https://doi.org/10.1038/s43588-021-00184-y.\n\n\nIntel and IBM are leading commercial efforts in neuromorphic hardware. Intel’s Loihi and Loihi 2 chips (Davies et al. 2018, 2021) offer programmable neuromorphic cores with on-chip learning. IBM’s Northpole (Modha et al. 2023) device comprises over 100 million magnetic tunnel junction synapses and 68 billion transistors. These specialized chips deliver benefits like low power consumption for edge inference.\n\nDavies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, et al. 2018. “Loihi: A Neuromorphic Manycore Processor with on-Chip Learning.” IEEE Micro 38 (1): 82–99. https://doi.org/10.1109/mm.2018.112130359.\n\nDavies, Mike, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R. Risbud. 2021. “Advancing Neuromorphic Computing with Loihi: A Survey of Results and Outlook.” Proc. IEEE 109 (5): 911–34. https://doi.org/10.1109/jproc.2021.3067593.\n\nModha, Dharmendra S., Filipp Akopyan, Alexander Andreopoulos, Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, et al. 2023. “Neural Inference at the Frontier of Energy, Space, and Time.” Science 382 (6668): 329–35. https://doi.org/10.1126/science.adh1174.\n\nMaass, Wolfgang. 1997. “Networks of Spiking Neurons: The Third Generation of Neural Network Models.” Neural Networks 10 (9): 1659–71. https://doi.org/10.1016/s0893-6080(97)00011-7.\nSpiking neural networks (SNNs) (Maass 1997) are computational models for neuromorphic hardware. Unlike deep neural networks communicating via continuous values, SNNs use discrete spikes that are more akin to biological neurons. This allows efficient event-based computation rather than constant processing. Additionally, SNNs consider the temporal and spatial characteristics of input data. This better mimics biological neural networks, where the timing of neuronal spikes plays an important role. However, training SNNs remains challenging due to the added temporal complexity. Figure 10.7 provides an overview of the spiking methodology: (a) Diagram of a neuron; (b) Measuring an action potential propagated along the axon of a neuron. Only the action potential is detectable along the axon; (c) The neuron’s spike is approximated with a binary representation; (d) Event-Driven Processing; (e) Active Pixel Sensor and Dynamic Vision Sensor.\nYou can also watch the video linked below for a more detailed explanation.\n\n\n\n\n\n\nFigure 10.7: Neuromoprhic spiking. Credit: Eshraghian et al. (2023).\n\n\nEshraghian, Jason K., Max Ward, Emre O. Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D. Lu. 2023. “Training Spiking Neural Networks Using Lessons from Deep Learning.” Proc. IEEE 111 (9): 1016–54. https://doi.org/10.1109/jproc.2023.3308088.\n\n\n\nSpecialized nanoelectronic devices called memristors (Chua 1971) are synaptic components in neuromorphic systems. Memristors act as nonvolatile memory with adjustable conductance, emulating the plasticity of real synapses. Memristors enable in-situ learning without separate data transfers by combining memory and processing functions. However, memristor technology has yet to reach maturity and scalability for commercial hardware.\n\nChua, L. 1971. “Memristor-the Missing Circuit Element.” #IEEE_J_CT# 18 (5): 507–19. https://doi.org/10.1109/tct.1971.1083337.\nThe integration of photonics with neuromorphic computing (Shastri et al. 2021) has recently emerged as an active research area. Using light for computation and communication allows high speeds and reduced energy consumption. However, fully realizing photonic neuromorphic systems requires overcoming design and integration challenges.\nNeuromorphic computing offers promising capabilities for efficient edge inference but faces obstacles around training algorithms, nanodevice integration, and system design. Ongoing multidisciplinary research across computer science, engineering, materials science, and physics will be key to unlocking this technology’s full potential for AI use cases.\n\n\n10.8.3 Analog Computing\nAnalog computing is an emerging approach that uses analog signals and components like capacitors, inductors, and amplifiers rather than digital logic for computing. It represents information as continuous electrical signals instead of discrete 0s and 1s. This allows the computation to directly reflect the analog nature of real-world data, avoiding digitization errors and overhead.\nAnalog computing has generated renewed interest in efficient AI hardware, particularly for inference directly on low-power edge devices. Analog circuits, such as multiplication and summation at the core of neural networks, can be used with very low energy consumption. This makes analog well-suited for deploying ML models on energy-constrained end nodes. Startups like Mythic are developing analog AI accelerators.\nWhile analog computing was popular in early computers, the boom of digital logic led to its decline. However, analog is compelling for niche applications requiring extreme efficiency (Haensch, Gokmen, and Puri 2019). It contrasts with digital neuromorphic approaches that still use digital spikes for computation. Analog may allow lower precision computation but requires expertise in analog circuit design. Tradeoffs around precision, programming complexity, and fabrication costs remain active research areas.\n\nHaensch, Wilfried, Tayfun Gokmen, and Ruchir Puri. 2019. “The Next Generation of Deep Learning Hardware: Analog Computing.” Proc. IEEE 107 (1): 108–22. https://doi.org/10.1109/jproc.2018.2871057.\n\nHazan, Avi, and Elishai Ezra Tsur. 2021. “Neuromorphic Analog Implementation of Neural Engineering Framework-Inspired Spiking Neuron for High-Dimensional Representation.” Front. Neurosci. 15 (February): 627221. https://doi.org/10.3389/fnins.2021.627221.\nNeuromorphic computing, which aims to emulate biological neural systems for efficient ML inference, can use analog circuits to implement the key components and behaviors of brains. For example, researchers have designed analog circuits to model neurons and synapses using capacitors, transistors, and operational amplifiers (Hazan and Ezra Tsur 2021). The capacitors can exhibit the spiking dynamics of biological neurons, while the amplifiers and transistors provide a weighted summation of inputs to mimic dendrites. Variable resistor technologies like memristors can realize analog synapses with spike-timing-dependent plasticity, which can strengthen or weaken connections based on spiking activity.\nStartups like SynSense have developed analog neuromorphic chips containing these biomimetic components (Bains 2020). This analog approach results in low power consumption and high scalability for edge devices versus complex digital SNN implementations.\n\nBains, Sunny. 2020. “The Business of Building Brains.” Nature Electronics 3 (7): 348–51. https://doi.org/10.1038/s41928-020-0449-1.\nHowever, training analog SNNs on chips remains an open challenge. Overall, analog realization is a promising technique for delivering the efficiency, scalability, and biological plausibility envisioned with neuromorphic computing. The physics of analog components combined with neural architecture design could improve inference efficiency over conventional digital neural networks.\n\n\n10.8.4 Flexible Electronics\nWhile much of the new hardware technology in the ML workspace has been focused on optimizing and making systems more efficient, there’s a parallel trajectory aiming to adapt hardware for specific applications (Gates 2009; Musk et al. 2019; Tang et al. 2023; Tang, He, and Liu 2022; Kwon and Dong 2022). One such avenue is the development of flexible electronics for AI use cases.\n\nGates, Byron D. 2009. “Flexible Electronics.” Science 323 (5921): 1566–67. https://doi.org/10.1126/science.1171230.\n\nTang, Xin, Hao Shen, Siyuan Zhao, Na Li, and Jia Liu. 2023. “Flexible Braincomputer Interfaces.” Nature Electronics 6 (2): 109–18. https://doi.org/10.1038/s41928-022-00913-9.\n\nTang, Xin, Yichun He, and Jia Liu. 2022. “Soft Bioelectronics for Cardiac Interfaces.” Biophysics Reviews 3 (1). https://doi.org/10.1063/5.0069516.\nFlexible electronics refer to electronic circuits and devices fabricated on flexible plastic or polymer substrates rather than rigid silicon. Unlike conventional rigid boards and chips, this allows the electronics to bend, twist, and conform to irregular shapes. Figure 10.8 shows an example of a flexible device prototype that wirelessly measures body temperature, which can be seamlessly integrated into clothing or skin patches. The flexibility and bendability of emerging electronic materials allow them to be integrated into thin, lightweight form factors that are well-suited for embedded AI and TinyML applications.\nFlexible AI hardware can conform to curvy surfaces and operate efficiently with microwatt power budgets. Flexibility also enables rollable or foldable form factors to minimize device footprint and weight, ideal for small, portable smart devices and wearables incorporating TinyML. Another key advantage of flexible electronics compared to conventional technologies is lower manufacturing costs and simpler fabrication processes, which could democratize access to these technologies. While silicon masks and fabrication costs typically cost millions of dollars, flexible hardware typically costs only tens of cents to manufacture (Huang et al. 2011; Biggs et al. 2021). The potential to fabricate flexible electronics directly onto plastic films using high-throughput printing and coating processes can reduce costs and improve manufacturability at scale versus rigid AI chips (Musk et al. 2019).\n\nHuang, Tsung-Ching, Kenjiro Fukuda, Chun-Ming Lo, Yung-Hui Yeh, Tsuyoshi Sekitani, Takao Someya, and Kwang-Ting Cheng. 2011. “Pseudo-CMOS: A Design Style for Low-Cost and Robust Flexible Electronics.” IEEE Trans. Electron Devices 58 (1): 141–50. https://doi.org/10.1109/ted.2010.2088127.\n\nBiggs, John, James Myers, Jedrzej Kufel, Emre Ozer, Simon Craske, Antony Sou, Catherine Ramsdale, Ken Williamson, Richard Price, and Scott White. 2021. “A Natively Flexible 32-Bit Arm Microprocessor.” Nature 595 (7868): 532–36. https://doi.org/10.1038/s41586-021-03625-w.\n\n\n\n\n\n\nFigure 10.8: Flexible device prototype. Credit: Jabil Circuit.\n\n\n\nThe field is enabled by advances in organic semiconductors and nanomaterials that can be deposited on thin, flexible films. However, fabrication remains challenging compared to mature silicon processes. Flexible circuits currently typically exhibit lower performance than rigid equivalents. Still, they promise to transform electronics into lightweight, bendable materials.\nFlexible electronics use cases are well-suited for intimate integration with the human body. Potential medical AI applications include bio-integrated sensors, soft assistive robots, and implants that monitor or stimulate the nervous system intelligently. Specifically, flexible electrode arrays could enable higher-density, less-invasive neural interfaces compared to rigid equivalents.\nTherefore, flexible electronics are ushering in a new era of wearables and body sensors, largely due to innovations in organic transistors. These components allow for more lightweight and bendable electronics, ideal for wearables, electronic skin, and body-conforming medical devices.\nThey are well-suited for bioelectronic devices in terms of biocompatibility, opening avenues for applications in brain and cardiac interfaces. For example, research in flexible brain-computer interfaces and soft bioelectronics for cardiac applications demonstrates the potential for wide-ranging medical applications.\nCompanies and research institutions are not only developing and investing great amounts of resources in flexible electrodes, as showcased in Neuralink’s work (Musk et al. 2019). Still, they are also pushing the boundaries to integrate machine learning models within the systems (Kwon and Dong 2022). These smart sensors aim for a seamless, long-lasting symbiosis with the human body.\n\nMusk, Elon et al. 2019. “An Integrated Brain-Machine Interface Platform with Thousands of Channels.” J. Med. Internet Res. 21 (10): e16194. https://doi.org/10.2196/16194.\n\nKwon, Sun Hwa, and Lin Dong. 2022. “Flexible Sensors and Machine Learning for Heart Monitoring.” Nano Energy 102 (November): 107632. https://doi.org/10.1016/j.nanoen.2022.107632.\n\nSegura Anaya, L. H., Abeer Alsadoon, N. Costadopoulos, and P. W. C. Prasad. 2017. “Ethical Implications of User Perceptions of Wearable Devices.” Sci. Eng. Ethics 24 (1): 1–28. https://doi.org/10.1007/s11948-017-9872-8.\n\nGoodyear, Victoria A. 2017. “Social Media, Apps and Wearable Technologies: Navigating Ethical Dilemmas and Procedures.” Qualitative Research in Sport, Exercise and Health 9 (3): 285–302. https://doi.org/10.1080/2159676x.2017.1303790.\n\nFarah, Martha J. 2005. “Neuroethics: The Practical and the Philosophical.” Trends Cogn. Sci. 9 (1): 34–40. https://doi.org/10.1016/j.tics.2004.12.001.\n\nRoskies, Adina. 2002. “Neuroethics for the New Millenium.” Neuron 35 (1): 21–23. https://doi.org/10.1016/s0896-6273(02)00763-8.\nEthically, incorporating smart, machine-learning-driven sensors within the body raises important questions. Issues surrounding data privacy, informed consent, and the long-term societal implications of such technologies are the focus of ongoing work in neuroethics and bioethics (Segura Anaya et al. 2017; Goodyear 2017; Farah 2005; Roskies 2002). The field is progressing at a pace that necessitates parallel advancements in ethical frameworks to guide the responsible development and deployment of these technologies. While there are limitations and ethical hurdles to overcome, the prospects for flexible electronics are expansive and hold immense promise for future research and applications.\n\n\n10.8.5 Memory Technologies\nMemory technologies are critical to AI hardware, but conventional DDR DRAM and SRAM create bottlenecks. AI workloads require high bandwidth (>1 TB/s). Extreme scientific applications of AI require extremely low latency (<50 ns) to feed data to compute units (Duarte et al. 2022), high density (>128Gb) to store large model parameters and data sets, and excellent energy efficiency (<100 fJ/b) for embedded use (Verma et al. 2019). New memories are needed to meet these demands. Emerging options include several new technologies:\n\nDuarte, Javier, Nhan Tran, Ben Hawks, Christian Herwig, Jules Muhizi, Shvetank Prakash, and Vijay Janapa Reddi. 2022. “FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning.” ArXiv Preprint abs/2207.07958. https://arxiv.org/abs/2207.07958.\n\nVerma, Naveen, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay, Lung-Yen Chen, Bonan Zhang, and Peter Deaville. 2019. “In-Memory Computing: Advances and Prospects.” IEEE Solid-State Circuits Mag. 11 (3): 43–55. https://doi.org/10.1109/mssc.2019.2922889.\n\nResistive RAM (ReRAM) can improve density with simple, passive arrays. However, challenges around variability remain (Chi et al. 2016).\nPhase change memory (PCM) exploits the unique properties of chalcogenide glass. Crystalline and amorphous phases have different resistances. Intel’s Optane DCPMM provides fast (100ns), high endurance PCM. However, challenges include limited write cycles and high reset current (Burr et al. 2016).\n3D stacking can also boost memory density and bandwidth by vertically integrating memory layers with TSV interconnects (Loh 2008). For example, HBM provides 1024-bit wide interfaces.\n\n\nBurr, Geoffrey W., Matthew J. BrightSky, Abu Sebastian, Huai-Yu Cheng, Jau-Yi Wu, Sangbum Kim, Norma E. Sosa, et al. 2016. “Recent Progress in Phase-Change\\(<\\)?Pub _Newline ?\\(>\\)Memory Technology.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 (2): 146–62. https://doi.org/10.1109/jetcas.2016.2547718.\n\nLoh, Gabriel H. 2008. “3D-Stacked Memory Architectures for Multi-Core Processors.” ACM SIGARCH Computer Architecture News 36 (3): 453–64. https://doi.org/10.1145/1394608.1382159.\nNew memory technologies, with their innovative cell architectures and materials, are critical to unlocking the next level of AI hardware performance and efficiency. Realizing their benefits in commercial systems remains an ongoing challenge.\nIn-memory computing is gaining traction as a promising avenue for optimizing machine learning and high-performance computing workloads. At its core, the technology co-locates data storage and computation to improve energy efficiency and reduce latency Wong et al. (2012). Two key technologies under this umbrella are Resistive RAM (ReRAM) and Processing-In-Memory (PIM).\n\nWong, H.-S. Philip, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai. 2012. “MetalOxide RRAM.” Proc. IEEE 100 (6): 1951–70. https://doi.org/10.1109/jproc.2012.2190369.\n\nChi, Ping, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. “Prime: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.” ACM SIGARCH Computer Architecture News 44 (3): 27–39. https://doi.org/10.1145/3007787.3001140.\nReRAM (Wong et al. 2012) and PIM (Chi et al. 2016) are the backbones for in-memory computing, storing and computing data in the same location. ReRAM focuses on issues of uniformity, endurance, retention, multi-bit operation, and scalability. On the other hand, PIM involves CPU units integrated directly into memory arrays, specialized for tasks like matrix multiplication, which are central in AI computations.\nThese technologies find applications in AI workloads and high-performance computing, where the synergy of storage and computation can lead to significant performance gains. The architecture is particularly useful for compute-intensive tasks common in machine learning models.\nWhile in-memory computing technologies like ReRAM and PIM offer exciting prospects for efficiency and performance, they come with their own challenges, such as data uniformity and scalability issues in ReRAM (Imani, Rahimi, and S. Rosing 2016). Nonetheless, the field is ripe for innovation, and addressing these limitations can open new frontiers in AI and high-performance computing.\n\nImani, Mohsen, Abbas Rahimi, and Tajana S. Rosing. 2016. “Resistive Configurable Associative Memory for Approximate Computing.” In Proceedings of the 2016 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1327–32. IEEE; Research Publishing Services. https://doi.org/10.3850/9783981537079_0454.\n\n\n10.8.6 Optical Computing\nIn AI acceleration, a burgeoning area of interest lies in novel technologies that deviate from traditional paradigms. Some emerging technologies mentioned above, such as flexible electronics, in-memory computing, or even neuromorphic computing, are close to becoming a reality, given their ground-breaking innovations and applications. One of the promising and leading next-gen frontiers is optical computing technologies H. Zhou et al. (2022). Companies like [LightMatter] are pioneering the use of light photonics for calculations, thereby utilizing photons instead of electrons for data transmission and computation.\n\nZhou, Hailong, Jianji Dong, Junwei Cheng, Wenchan Dong, Chaoran Huang, Yichen Shen, Qiming Zhang, et al. 2022. “Photonic Matrix Multiplication Lights up Photonic Accelerator and Beyond.” Light: Science &Amp; Applications 11 (1): 30. https://doi.org/10.1038/s41377-022-00717-8.\n\nShastri, Bhavin J., Alexander N. Tait, T. Ferreira de Lima, Wolfram H. P. Pernice, Harish Bhaskaran, C. D. Wright, and Paul R. Prucnal. 2021. “Photonics for Artificial Intelligence and Neuromorphic Computing.” Nat. Photonics 15 (2): 102–14. https://doi.org/10.1038/s41566-020-00754-y.\nOptical computing utilizes photons and photonic devices rather than traditional electronic circuits for computing and data processing. It takes inspiration from fiber optic communication links that rely on light for fast, efficient data transfer (Shastri et al. 2021). Light can propagate with much less loss than semiconductors’ electrons, enabling inherent speed and efficiency benefits.\nSome specific advantages of optical computing include:\n\nHigh throughput: Photons can transmit with bandwidths >100 Tb/s using wavelength division multiplexing.\nLow latency: Photons interact on femtosecond timescales, millions faster than silicon transistors.\nParallelism: Multiple data signals can propagate simultaneously through the same optical medium.\nLow power: Photonic circuits utilizing waveguides and resonators can achieve complex logic and memory with only microwatts of power.\n\nHowever, optical computing currently faces significant challenges:\n\nLack of optical memory equivalent to electronic RAM\nRequires conversion between optical and electrical domains.\nLimited set of available optical components compared to rich electronics ecosystem.\nImmature integration methods to combine photonics with traditional CMOS chips.\nComplex programming models required to handle parallelism.\n\nAs a result, optical computing is still in the very early research stage despite its promising potential. However, technical breakthroughs could enable it to complement electronics and unlock performance gains for AI workloads. Companies like Lightmatter are pioneering early optical AI accelerators. In the long term, if key challenges are overcome, it could represent a revolutionary computing substrate.\n\n\n10.8.7 Quantum Computing\nQuantum computers leverage unique phenomena of quantum physics, like superposition and entanglement, to represent and process information in ways not possible classically. Instead of binary bits, the fundamental unit is the quantum bit or qubit. Unlike classical bits, which are limited to 0 or 1, qubits can exist simultaneously in a superposition of both states due to quantum effects.\nMultiple qubits can also be entangled, leading to exponential information density but introducing probabilistic results. Superposition enables parallel computation on all possible states, while entanglement allows nonlocal correlations between qubits.\nQuantum algorithms carefully manipulate these inherently quantum mechanical effects to solve problems like optimization or search more efficiently than their classical counterparts in theory.\n\nFaster training of deep neural networks by exploiting quantum parallelism for linear algebra operations.\nEfficient quantum ML algorithms make use of the unique capabilities of qubits.\nQuantum neural networks with inherent quantum effects baked into the model architecture.\nQuantum optimizers leveraging quantum annealing or adiabatic algorithms for combinatorial optimization problems.\n\nHowever, quantum states are fragile and prone to errors that require error-correcting protocols. The non-intuitive nature of quantum programming also introduces challenges not present in classical computing.\n\nNoisy and fragile quantum bits are difficult to scale up. The largest quantum computer today has less than 100 qubits.\nRestricted set of available quantum gates and circuits relative to classical programming.\nLack of datasets and benchmarks to evaluate quantum ML in practical domains.\n\nWhile meaningful quantum advantage for ML remains far off, active research at companies like D-Wave, Rigetti, and IonQ is advancing quantum computer engineering and quantum algorithms. Major technology companies like Google, IBM, and Microsoft are actively exploring quantum computing. Google recently announced a 72-qubit quantum processor called Bristlecone and plans to build a 49-qubit commercial quantum system. Microsoft also has an active research program in topological quantum computing and collaborates with quantum startup IonQ\nQuantum techniques may first make inroads into optimization before more generalized ML adoption. Realizing quantum ML’s full potential awaits major milestones in quantum hardware development and ecosystem maturity.", + "text": "10.8 Emerging Technologies\nThus far, we have discussed AI hardware technology in the context of conventional von Neumann architecture design and CMOS-based implementation. These specialized AI chips offer benefits like higher throughput and power efficiency but rely on traditional computing principles. The relentless growth in demand for AI computing power is driving innovations in integration methods for AI hardware.\nTwo leading approaches have emerged for maximizing compute density—wafer-scale integration and chiplet-based architectures—which we will discuss in this section. Looking much further ahead, we will examine emerging technologies that diverge from conventional architectures and adopt fundamentally different approaches for AI-specialized computing.\nSome of these unconventional paradigms include neuromorphic computing, which mimics biological neural networks; quantum computing, which leverages quantum mechanical effects; and optical computing, which utilizes photons instead of electrons. Beyond novel computing substrates, new device technologies are enabling additional gains through better memory and interconnecting.\nExamples include memristors for in-memory computing and nanophotonics for integrated photonic communication. Together, these technologies offer the potential for orders of magnitude improvements in speed, efficiency, and scalability compared to current AI hardware. We will examine these in this section.\n\n10.8.1 Integration Methods\nIntegration methods refer to the approaches used to combine and interconnect an AI chip or system’s various computational and memory components. By closely linking the key processing elements, integration aims to maximize performance, power efficiency, and density.\nIn the past, AI computing was primarily performed on CPUs and GPUs built using conventional integration methods. These discrete components were manufactured separately and connected together on a board. However, this loose integration creates bottlenecks, such as data transfer overheads.\nAs AI workloads have grown, there is increasing demand for tighter integration between computing, memory, and communication elements. Some key drivers of integration include:\n\nMinimizing data movement: Tight integration reduces latency and power for moving data between components. This improves efficiency.\nCustomization: Tailoring all system components to AI workloads allows optimizations throughout the hardware stack.\nParallelism: Integrating many processing elements enables massively parallel computation.\nDensity: Tighter integration allows more transistors and memory to be packed into a given area.\nCost: Economies of scale from large integrated systems can reduce costs.\n\nIn response, new manufacturing techniques like wafer-scale fabrication and advanced packaging now allow much higher levels of integration. The goal is to create unified, specialized AI compute complexes tailored for deep learning and other AI algorithms. Tighter integration is key to delivering the performance and efficiency needed for the next generation of AI.\n\nWafer-scale AI\nWafer-scale AI takes an extremely integrated approach, manufacturing an entire silicon wafer as one gigantic chip. This differs drastically from conventional CPUs and GPUs, which cut each wafer into many smaller individual chips. Figure 10.4 shows a comparison between Cerebras Wafer Scale Engine 2, which is the largest chip ever built, and the largest GPU. While some GPUs may contain billions of transistors, they still pale in Comparison to the scale of a wafer-size chip with over a trillion transistors.\nThe wafer-scale approach also diverges from more modular system-on-chip designs that still have discrete components communicating by bus. Instead, wafer-scale AI enables full customization and tight integration of computation, memory, and interconnects across the entire die.\n\n\n\n\n\n\nFigure 10.4: Wafer-scale vs. GPU. Credit: Cerebras.\n\n\n\nBy designing the wafer as one integrated logic unit, data transfer between elements is minimized. This provides lower latency and power consumption than discrete system-on-chip or chiplet designs. While chiplets can offer flexibility by mixing and matching components, communication between chiplets is challenging. The monolithic nature of wafer-scale integration eliminates these inter-chip communication bottlenecks.\nHowever, the ultra-large-scale also poses difficulties for manufacturability and yield with wafer-scale designs. Defects in any region of the wafer can make (certain parts of) the chip unusable. Specialized lithography techniques are required to produce such large dies. So, wafer-scale integration pursues the maximum performance gains from integration but requires overcoming substantial fabrication challenges.\nVideo 10.1 provides additional context about wafer-scale AI chips.\n\n\n\n\n\n\nVideo 10.1: Wafer-scale AI Chips\n\n\n\n\n\n\n\n\nChiplets for AI\nChiplet design refers to a semiconductor architecture in which a single integrated circuit (IC) is constructed from multiple smaller, individual components known as chiplets. Each chiplet is a self-contained functional block, typically specialized for a specific task or functionality. These chiplets are then interconnected on a larger substrate or package to create a cohesive system. Figure 10.5 illustrates this concept. For AI hardware, chiplets enable the mixing of different types of chips optimized for tasks like matrix multiplication, data movement, analog I/O, and specialized memories. This heterogeneous integration differs greatly from wafer-scale integration, where all logic is manufactured as one monolithic chip. Companies like Intel and AMD have adopted chiplet designs for their CPUs.\nChiplets are interconnected using advanced packaging techniques like high-density substrate interposers, 2.5D/3D stacking, and wafer-level packaging. This allows combining chiplets fabricated with different process nodes, specialized memories, and various optimized AI engines.\n\n\n\n\n\n\nFigure 10.5: Chiplet partitioning. Credit: Vivet et al. (2021).\n\n\nVivet, Pascal, Eric Guthmuller, Yvain Thonnart, Gael Pillonnet, Cesar Fuguet, Ivan Miro-Panades, Guillaume Moritz, et al. 2021. “IntAct: A 96-Core Processor with Six Chiplets 3D-Stacked on an Active Interposer with Distributed Interconnects and Integrated Power Management.” IEEE J. Solid-State Circuits 56 (1): 79–97. https://doi.org/10.1109/jssc.2020.3036341.\n\n\nSome key advantages of using chiplets for AI include:\n\nFlexibility: Flexibility: Chiplets allow for the combination of different chip types, process nodes, and memories tailored for each function. This is more modular versus a fixed wafer-scale design.\nYield: Smaller chiplets have a higher yield than a gigantic wafer-scale chip. Defects are contained in individual chiplets.\nCost: Leverages existing manufacturing capabilities versus requiring specialized new processes. Reduces costs by reusing mature fabrication.\nCompatibility: Can integrate with more conventional system architectures like PCIe and standard DDR memory interfaces.\n\nHowever, chiplets also face integration and performance challenges:\n\nLower density compared to wafer-scale, as chiplets are limited in size.\nAdded latency when communicating between chiplets versus monolithic integration. Requires optimization for low-latency interconnect.\nAdvanced packaging adds complexity versus wafer-scale integration, though this is arguable.\n\nThe key objective of chiplets is finding the right balance between modular flexibility and integration density for optimal AI performance. Chiplets aim for efficient AI acceleration while working within the constraints of conventional manufacturing techniques. Chiplets take a middle path between the extremes of wafer-scale integration and fully discrete components. This provides practical benefits but may sacrifice some computational density and efficiency versus a theoretical wafer-size system.\n\n\n\n10.8.2 Neuromorphic Computing\nNeuromorphic computing is an emerging field aiming to emulate the efficiency and robustness of biological neural systems for machine learning applications. A key difference from classical Von Neumann architectures is the merging of memory and processing in the same circuit (Schuman et al. 2022; Marković et al. 2020; Furber 2016), as illustrated in Figure 10.6. The structure of the brain inspires this integrated approach. A key advantage is the potential for orders of magnitude improvement in energy-efficient computation compared to conventional AI hardware. For example, estimates project 100x-1000x gains in energy efficiency versus current GPU-based systems for equivalent workloads.\n\nMarković, Danijela, Alice Mizrahi, Damien Querlioz, and Julie Grollier. 2020. “Physics for Neuromorphic Computing.” Nature Reviews Physics 2 (9): 499–510. https://doi.org/10.1038/s42254-020-0208-2.\n\nFurber, Steve. 2016. “Large-Scale Neuromorphic Computing Systems.” J. Neural Eng. 13 (5): 051001. https://doi.org/10.1088/1741-2560/13/5/051001.\n\n\n\n\n\n\nFigure 10.6: Comparison of the von Neumann architecture with the neuromorphic architecture. Credit: Schuman et al. (2022).\n\n\nSchuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker Mitchell, Prasanna Date, and Bill Kay. 2022. “Opportunities for Neuromorphic Computing Algorithms and Applications.” Nature Computational Science 2 (1): 10–19. https://doi.org/10.1038/s43588-021-00184-y.\n\n\nIntel and IBM are leading commercial efforts in neuromorphic hardware. Intel’s Loihi and Loihi 2 chips (Davies et al. 2018, 2021) offer programmable neuromorphic cores with on-chip learning. IBM’s Northpole (Modha et al. 2023) device comprises over 100 million magnetic tunnel junction synapses and 68 billion transistors. These specialized chips deliver benefits like low power consumption for edge inference.\n\nDavies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, et al. 2018. “Loihi: A Neuromorphic Manycore Processor with on-Chip Learning.” IEEE Micro 38 (1): 82–99. https://doi.org/10.1109/mm.2018.112130359.\n\nDavies, Mike, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R. Risbud. 2021. “Advancing Neuromorphic Computing with Loihi: A Survey of Results and Outlook.” Proc. IEEE 109 (5): 911–34. https://doi.org/10.1109/jproc.2021.3067593.\n\nModha, Dharmendra S., Filipp Akopyan, Alexander Andreopoulos, Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, et al. 2023. “Neural Inference at the Frontier of Energy, Space, and Time.” Science 382 (6668): 329–35. https://doi.org/10.1126/science.adh1174.\n\nMaass, Wolfgang. 1997. “Networks of Spiking Neurons: The Third Generation of Neural Network Models.” Neural Networks 10 (9): 1659–71. https://doi.org/10.1016/s0893-6080(97)00011-7.\nSpiking neural networks (SNNs) (Maass 1997) are computational models for neuromorphic hardware. Unlike deep neural networks communicating via continuous values, SNNs use discrete spikes that are more akin to biological neurons. This allows efficient event-based computation rather than constant processing. Additionally, SNNs consider the temporal and spatial characteristics of input data. This better mimics biological neural networks, where the timing of neuronal spikes plays an important role. However, training SNNs remains challenging due to the added temporal complexity. Figure 10.7 provides an overview of the spiking methodology: (a) Diagram of a neuron; (b) Measuring an action potential propagated along the axon of a neuron. Only the action potential is detectable along the axon; (c) The neuron’s spike is approximated with a binary representation; (d) Event-Driven Processing; (e) Active Pixel Sensor and Dynamic Vision Sensor.\n\n\n\n\n\n\nFigure 10.7: Neuromoprhic spiking. Credit: Eshraghian et al. (2023).\n\n\nEshraghian, Jason K., Max Ward, Emre O. Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D. Lu. 2023. “Training Spiking Neural Networks Using Lessons from Deep Learning.” Proc. IEEE 111 (9): 1016–54. https://doi.org/10.1109/jproc.2023.3308088.\n\n\nYou can also watch Video 10.2 linked below for a more detailed explanation.\n\n\n\n\n\n\nVideo 10.2: Neuromorphic Computing\n\n\n\n\n\n\nSpecialized nanoelectronic devices called memristors (Chua 1971) are synaptic components in neuromorphic systems. Memristors act as nonvolatile memory with adjustable conductance, emulating the plasticity of real synapses. Memristors enable in-situ learning without separate data transfers by combining memory and processing functions. However, memristor technology has yet to reach maturity and scalability for commercial hardware.\n\nChua, L. 1971. “Memristor-the Missing Circuit Element.” #IEEE_J_CT# 18 (5): 507–19. https://doi.org/10.1109/tct.1971.1083337.\nThe integration of photonics with neuromorphic computing (Shastri et al. 2021) has recently emerged as an active research area. Using light for computation and communication allows high speeds and reduced energy consumption. However, fully realizing photonic neuromorphic systems requires overcoming design and integration challenges.\nNeuromorphic computing offers promising capabilities for efficient edge inference but faces obstacles around training algorithms, nanodevice integration, and system design. Ongoing multidisciplinary research across computer science, engineering, materials science, and physics will be key to unlocking this technology’s full potential for AI use cases.\n\n\n10.8.3 Analog Computing\nAnalog computing is an emerging approach that uses analog signals and components like capacitors, inductors, and amplifiers rather than digital logic for computing. It represents information as continuous electrical signals instead of discrete 0s and 1s. This allows the computation to directly reflect the analog nature of real-world data, avoiding digitization errors and overhead.\nAnalog computing has generated renewed interest in efficient AI hardware, particularly for inference directly on low-power edge devices. Analog circuits, such as multiplication and summation at the core of neural networks, can be used with very low energy consumption. This makes analog well-suited for deploying ML models on energy-constrained end nodes. Startups like Mythic are developing analog AI accelerators.\nWhile analog computing was popular in early computers, the boom of digital logic led to its decline. However, analog is compelling for niche applications requiring extreme efficiency (Haensch, Gokmen, and Puri 2019). It contrasts with digital neuromorphic approaches that still use digital spikes for computation. Analog may allow lower precision computation but requires expertise in analog circuit design. Tradeoffs around precision, programming complexity, and fabrication costs remain active research areas.\n\nHaensch, Wilfried, Tayfun Gokmen, and Ruchir Puri. 2019. “The Next Generation of Deep Learning Hardware: Analog Computing.” Proc. IEEE 107 (1): 108–22. https://doi.org/10.1109/jproc.2018.2871057.\n\nHazan, Avi, and Elishai Ezra Tsur. 2021. “Neuromorphic Analog Implementation of Neural Engineering Framework-Inspired Spiking Neuron for High-Dimensional Representation.” Front. Neurosci. 15 (February): 627221. https://doi.org/10.3389/fnins.2021.627221.\nNeuromorphic computing, which aims to emulate biological neural systems for efficient ML inference, can use analog circuits to implement the key components and behaviors of brains. For example, researchers have designed analog circuits to model neurons and synapses using capacitors, transistors, and operational amplifiers (Hazan and Ezra Tsur 2021). The capacitors can exhibit the spiking dynamics of biological neurons, while the amplifiers and transistors provide a weighted summation of inputs to mimic dendrites. Variable resistor technologies like memristors can realize analog synapses with spike-timing-dependent plasticity, which can strengthen or weaken connections based on spiking activity.\nStartups like SynSense have developed analog neuromorphic chips containing these biomimetic components (Bains 2020). This analog approach results in low power consumption and high scalability for edge devices versus complex digital SNN implementations.\n\nBains, Sunny. 2020. “The Business of Building Brains.” Nature Electronics 3 (7): 348–51. https://doi.org/10.1038/s41928-020-0449-1.\nHowever, training analog SNNs on chips remains an open challenge. Overall, analog realization is a promising technique for delivering the efficiency, scalability, and biological plausibility envisioned with neuromorphic computing. The physics of analog components combined with neural architecture design could improve inference efficiency over conventional digital neural networks.\n\n\n10.8.4 Flexible Electronics\nWhile much of the new hardware technology in the ML workspace has been focused on optimizing and making systems more efficient, there’s a parallel trajectory aiming to adapt hardware for specific applications (Gates 2009; Musk et al. 2019; Tang et al. 2023; Tang, He, and Liu 2022; Kwon and Dong 2022). One such avenue is the development of flexible electronics for AI use cases.\n\nGates, Byron D. 2009. “Flexible Electronics.” Science 323 (5921): 1566–67. https://doi.org/10.1126/science.1171230.\n\nTang, Xin, Hao Shen, Siyuan Zhao, Na Li, and Jia Liu. 2023. “Flexible Braincomputer Interfaces.” Nature Electronics 6 (2): 109–18. https://doi.org/10.1038/s41928-022-00913-9.\n\nTang, Xin, Yichun He, and Jia Liu. 2022. “Soft Bioelectronics for Cardiac Interfaces.” Biophysics Reviews 3 (1). https://doi.org/10.1063/5.0069516.\nFlexible electronics refer to electronic circuits and devices fabricated on flexible plastic or polymer substrates rather than rigid silicon. Unlike conventional rigid boards and chips, this allows the electronics to bend, twist, and conform to irregular shapes. Figure 10.8 shows an example of a flexible device prototype that wirelessly measures body temperature, which can be seamlessly integrated into clothing or skin patches. The flexibility and bendability of emerging electronic materials allow them to be integrated into thin, lightweight form factors that are well-suited for embedded AI and TinyML applications.\nFlexible AI hardware can conform to curvy surfaces and operate efficiently with microwatt power budgets. Flexibility also enables rollable or foldable form factors to minimize device footprint and weight, ideal for small, portable smart devices and wearables incorporating TinyML. Another key advantage of flexible electronics compared to conventional technologies is lower manufacturing costs and simpler fabrication processes, which could democratize access to these technologies. While silicon masks and fabrication costs typically cost millions of dollars, flexible hardware typically costs only tens of cents to manufacture (Huang et al. 2011; Biggs et al. 2021). The potential to fabricate flexible electronics directly onto plastic films using high-throughput printing and coating processes can reduce costs and improve manufacturability at scale versus rigid AI chips (Musk et al. 2019).\n\nHuang, Tsung-Ching, Kenjiro Fukuda, Chun-Ming Lo, Yung-Hui Yeh, Tsuyoshi Sekitani, Takao Someya, and Kwang-Ting Cheng. 2011. “Pseudo-CMOS: A Design Style for Low-Cost and Robust Flexible Electronics.” IEEE Trans. Electron Devices 58 (1): 141–50. https://doi.org/10.1109/ted.2010.2088127.\n\nBiggs, John, James Myers, Jedrzej Kufel, Emre Ozer, Simon Craske, Antony Sou, Catherine Ramsdale, Ken Williamson, Richard Price, and Scott White. 2021. “A Natively Flexible 32-Bit Arm Microprocessor.” Nature 595 (7868): 532–36. https://doi.org/10.1038/s41586-021-03625-w.\n\n\n\n\n\n\nFigure 10.8: Flexible device prototype. Credit: Jabil Circuit.\n\n\n\nThe field is enabled by advances in organic semiconductors and nanomaterials that can be deposited on thin, flexible films. However, fabrication remains challenging compared to mature silicon processes. Flexible circuits currently typically exhibit lower performance than rigid equivalents. Still, they promise to transform electronics into lightweight, bendable materials.\nFlexible electronics use cases are well-suited for intimate integration with the human body. Potential medical AI applications include bio-integrated sensors, soft assistive robots, and implants that monitor or stimulate the nervous system intelligently. Specifically, flexible electrode arrays could enable higher-density, less-invasive neural interfaces compared to rigid equivalents.\nTherefore, flexible electronics are ushering in a new era of wearables and body sensors, largely due to innovations in organic transistors. These components allow for more lightweight and bendable electronics, ideal for wearables, electronic skin, and body-conforming medical devices.\nThey are well-suited for bioelectronic devices in terms of biocompatibility, opening avenues for applications in brain and cardiac interfaces. For example, research in flexible brain-computer interfaces and soft bioelectronics for cardiac applications demonstrates the potential for wide-ranging medical applications.\nCompanies and research institutions are not only developing and investing great amounts of resources in flexible electrodes, as showcased in Neuralink’s work (Musk et al. 2019). Still, they are also pushing the boundaries to integrate machine learning models within the systems (Kwon and Dong 2022). These smart sensors aim for a seamless, long-lasting symbiosis with the human body.\n\nMusk, Elon et al. 2019. “An Integrated Brain-Machine Interface Platform with Thousands of Channels.” J. Med. Internet Res. 21 (10): e16194. https://doi.org/10.2196/16194.\n\nKwon, Sun Hwa, and Lin Dong. 2022. “Flexible Sensors and Machine Learning for Heart Monitoring.” Nano Energy 102 (November): 107632. https://doi.org/10.1016/j.nanoen.2022.107632.\n\nSegura Anaya, L. H., Abeer Alsadoon, N. Costadopoulos, and P. W. C. Prasad. 2017. “Ethical Implications of User Perceptions of Wearable Devices.” Sci. Eng. Ethics 24 (1): 1–28. https://doi.org/10.1007/s11948-017-9872-8.\n\nGoodyear, Victoria A. 2017. “Social Media, Apps and Wearable Technologies: Navigating Ethical Dilemmas and Procedures.” Qualitative Research in Sport, Exercise and Health 9 (3): 285–302. https://doi.org/10.1080/2159676x.2017.1303790.\n\nFarah, Martha J. 2005. “Neuroethics: The Practical and the Philosophical.” Trends Cogn. Sci. 9 (1): 34–40. https://doi.org/10.1016/j.tics.2004.12.001.\n\nRoskies, Adina. 2002. “Neuroethics for the New Millenium.” Neuron 35 (1): 21–23. https://doi.org/10.1016/s0896-6273(02)00763-8.\nEthically, incorporating smart, machine-learning-driven sensors within the body raises important questions. Issues surrounding data privacy, informed consent, and the long-term societal implications of such technologies are the focus of ongoing work in neuroethics and bioethics (Segura Anaya et al. 2017; Goodyear 2017; Farah 2005; Roskies 2002). The field is progressing at a pace that necessitates parallel advancements in ethical frameworks to guide the responsible development and deployment of these technologies. While there are limitations and ethical hurdles to overcome, the prospects for flexible electronics are expansive and hold immense promise for future research and applications.\n\n\n10.8.5 Memory Technologies\nMemory technologies are critical to AI hardware, but conventional DDR DRAM and SRAM create bottlenecks. AI workloads require high bandwidth (>1 TB/s). Extreme scientific applications of AI require extremely low latency (<50 ns) to feed data to compute units (Duarte et al. 2022), high density (>128Gb) to store large model parameters and data sets, and excellent energy efficiency (<100 fJ/b) for embedded use (Verma et al. 2019). New memories are needed to meet these demands. Emerging options include several new technologies:\n\nDuarte, Javier, Nhan Tran, Ben Hawks, Christian Herwig, Jules Muhizi, Shvetank Prakash, and Vijay Janapa Reddi. 2022. “FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning.” ArXiv Preprint abs/2207.07958. https://arxiv.org/abs/2207.07958.\n\nVerma, Naveen, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay, Lung-Yen Chen, Bonan Zhang, and Peter Deaville. 2019. “In-Memory Computing: Advances and Prospects.” IEEE Solid-State Circuits Mag. 11 (3): 43–55. https://doi.org/10.1109/mssc.2019.2922889.\n\nResistive RAM (ReRAM) can improve density with simple, passive arrays. However, challenges around variability remain (Chi et al. 2016).\nPhase change memory (PCM) exploits the unique properties of chalcogenide glass. Crystalline and amorphous phases have different resistances. Intel’s Optane DCPMM provides fast (100ns), high endurance PCM. However, challenges include limited write cycles and high reset current (Burr et al. 2016).\n3D stacking can also boost memory density and bandwidth by vertically integrating memory layers with TSV interconnects (Loh 2008). For example, HBM provides 1024-bit wide interfaces.\n\n\nBurr, Geoffrey W., Matthew J. BrightSky, Abu Sebastian, Huai-Yu Cheng, Jau-Yi Wu, Sangbum Kim, Norma E. Sosa, et al. 2016. “Recent Progress in Phase-Change\\(<\\)?Pub _Newline ?\\(>\\)Memory Technology.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6 (2): 146–62. https://doi.org/10.1109/jetcas.2016.2547718.\n\nLoh, Gabriel H. 2008. “3D-Stacked Memory Architectures for Multi-Core Processors.” ACM SIGARCH Computer Architecture News 36 (3): 453–64. https://doi.org/10.1145/1394608.1382159.\nNew memory technologies, with their innovative cell architectures and materials, are critical to unlocking the next level of AI hardware performance and efficiency. Realizing their benefits in commercial systems remains an ongoing challenge.\nIn-memory computing is gaining traction as a promising avenue for optimizing machine learning and high-performance computing workloads. At its core, the technology co-locates data storage and computation to improve energy efficiency and reduce latency Wong et al. (2012). Two key technologies under this umbrella are Resistive RAM (ReRAM) and Processing-In-Memory (PIM).\n\nWong, H.-S. Philip, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai. 2012. “MetalOxide RRAM.” Proc. IEEE 100 (6): 1951–70. https://doi.org/10.1109/jproc.2012.2190369.\n\nChi, Ping, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. “Prime: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.” ACM SIGARCH Computer Architecture News 44 (3): 27–39. https://doi.org/10.1145/3007787.3001140.\nReRAM (Wong et al. 2012) and PIM (Chi et al. 2016) are the backbones for in-memory computing, storing and computing data in the same location. ReRAM focuses on issues of uniformity, endurance, retention, multi-bit operation, and scalability. On the other hand, PIM involves CPU units integrated directly into memory arrays, specialized for tasks like matrix multiplication, which are central in AI computations.\nThese technologies find applications in AI workloads and high-performance computing, where the synergy of storage and computation can lead to significant performance gains. The architecture is particularly useful for compute-intensive tasks common in machine learning models.\nWhile in-memory computing technologies like ReRAM and PIM offer exciting prospects for efficiency and performance, they come with their own challenges, such as data uniformity and scalability issues in ReRAM (Imani, Rahimi, and S. Rosing 2016). Nonetheless, the field is ripe for innovation, and addressing these limitations can open new frontiers in AI and high-performance computing.\n\nImani, Mohsen, Abbas Rahimi, and Tajana S. Rosing. 2016. “Resistive Configurable Associative Memory for Approximate Computing.” In Proceedings of the 2016 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1327–32. IEEE; Research Publishing Services. https://doi.org/10.3850/9783981537079_0454.\n\n\n10.8.6 Optical Computing\nIn AI acceleration, a burgeoning area of interest lies in novel technologies that deviate from traditional paradigms. Some emerging technologies mentioned above, such as flexible electronics, in-memory computing, or even neuromorphic computing, are close to becoming a reality, given their ground-breaking innovations and applications. One of the promising and leading next-gen frontiers is optical computing technologies H. Zhou et al. (2022). Companies like [LightMatter] are pioneering the use of light photonics for calculations, thereby utilizing photons instead of electrons for data transmission and computation.\n\nZhou, Hailong, Jianji Dong, Junwei Cheng, Wenchan Dong, Chaoran Huang, Yichen Shen, Qiming Zhang, et al. 2022. “Photonic Matrix Multiplication Lights up Photonic Accelerator and Beyond.” Light: Science &Amp; Applications 11 (1): 30. https://doi.org/10.1038/s41377-022-00717-8.\n\nShastri, Bhavin J., Alexander N. Tait, T. Ferreira de Lima, Wolfram H. P. Pernice, Harish Bhaskaran, C. D. Wright, and Paul R. Prucnal. 2021. “Photonics for Artificial Intelligence and Neuromorphic Computing.” Nat. Photonics 15 (2): 102–14. https://doi.org/10.1038/s41566-020-00754-y.\nOptical computing utilizes photons and photonic devices rather than traditional electronic circuits for computing and data processing. It takes inspiration from fiber optic communication links that rely on light for fast, efficient data transfer (Shastri et al. 2021). Light can propagate with much less loss than semiconductors’ electrons, enabling inherent speed and efficiency benefits.\nSome specific advantages of optical computing include:\n\nHigh throughput: Photons can transmit with bandwidths >100 Tb/s using wavelength division multiplexing.\nLow latency: Photons interact on femtosecond timescales, millions faster than silicon transistors.\nParallelism: Multiple data signals can propagate simultaneously through the same optical medium.\nLow power: Photonic circuits utilizing waveguides and resonators can achieve complex logic and memory with only microwatts of power.\n\nHowever, optical computing currently faces significant challenges:\n\nLack of optical memory equivalent to electronic RAM\nRequires conversion between optical and electrical domains.\nLimited set of available optical components compared to rich electronics ecosystem.\nImmature integration methods to combine photonics with traditional CMOS chips.\nComplex programming models required to handle parallelism.\n\nAs a result, optical computing is still in the very early research stage despite its promising potential. However, technical breakthroughs could enable it to complement electronics and unlock performance gains for AI workloads. Companies like Lightmatter are pioneering early optical AI accelerators. In the long term, if key challenges are overcome, it could represent a revolutionary computing substrate.\n\n\n10.8.7 Quantum Computing\nQuantum computers leverage unique phenomena of quantum physics, like superposition and entanglement, to represent and process information in ways not possible classically. Instead of binary bits, the fundamental unit is the quantum bit or qubit. Unlike classical bits, which are limited to 0 or 1, qubits can exist simultaneously in a superposition of both states due to quantum effects.\nMultiple qubits can also be entangled, leading to exponential information density but introducing probabilistic results. Superposition enables parallel computation on all possible states, while entanglement allows nonlocal correlations between qubits.\nQuantum algorithms carefully manipulate these inherently quantum mechanical effects to solve problems like optimization or search more efficiently than their classical counterparts in theory.\n\nFaster training of deep neural networks by exploiting quantum parallelism for linear algebra operations.\nEfficient quantum ML algorithms make use of the unique capabilities of qubits.\nQuantum neural networks with inherent quantum effects baked into the model architecture.\nQuantum optimizers leveraging quantum annealing or adiabatic algorithms for combinatorial optimization problems.\n\nHowever, quantum states are fragile and prone to errors that require error-correcting protocols. The non-intuitive nature of quantum programming also introduces challenges not present in classical computing.\n\nNoisy and fragile quantum bits are difficult to scale up. The largest quantum computer today has less than 100 qubits.\nRestricted set of available quantum gates and circuits relative to classical programming.\nLack of datasets and benchmarks to evaluate quantum ML in practical domains.\n\nWhile meaningful quantum advantage for ML remains far off, active research at companies like D-Wave, Rigetti, and IonQ is advancing quantum computer engineering and quantum algorithms. Major technology companies like Google, IBM, and Microsoft are actively exploring quantum computing. Google recently announced a 72-qubit quantum processor called Bristlecone and plans to build a 49-qubit commercial quantum system. Microsoft also has an active research program in topological quantum computing and collaborates with quantum startup IonQ\nQuantum techniques may first make inroads into optimization before more generalized ML adoption. Realizing quantum ML’s full potential awaits major milestones in quantum hardware development and ecosystem maturity.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1098,9 +1098,9 @@ "href": "contents/hw_acceleration/hw_acceleration.html#future-trends", "title": "10  AI Acceleration", "section": "10.9 Future Trends", - "text": "10.9 Future Trends\nIn this chapter, the primary focus has been on designing specialized hardware optimized for machine learning workloads and algorithms. This discussion encompassed the tailored architectures of GPUs and TPUs for neural network training and inference. However, an emerging research direction is leveraging machine learning to facilitate the hardware design process itself.\nThe hardware design process involves many complex stages, including specification, high-level modeling, simulation, synthesis, verification, prototyping, and fabrication. Much of this process traditionally requires extensive human expertise, effort, and time. However, recent advances in machine learning are enabling parts of the hardware design workflow to be automated and enhanced using ML techniques.\nSome examples of how ML is transforming hardware design include:\n\nAutomated circuit synthesis using reinforcement learning: Rather than hand-crafting transistor-level designs, ML agents such as reinforcement learning can learn to connect logic gates and generate circuit layouts automatically. This can accelerate the time-consuming synthesis process.\nML-based hardware simulation and emulation: Deep neural network models can be trained to predict how a hardware design will perform under different conditions. For instance, deep learning models can be trained to predict cycle counts for given workloads. This allows faster and more accurate simulation than traditional RTL simulations.\nAutomated chip floorplanning using ML algorithms: Chip floorplanning involves optimally placing different components on a die. Evolutionary algorithms like genetic algorithms and other ML algorithms like reinforcement learning are used to explore floorplan options. This can significantly improve manual floorplanning placements in terms of faster turnaround time and quality of placements.\nML-driven architecture optimization: Novel hardware architectures, like those for efficient ML accelerators, can be automatically generated and optimized by searching the architectural design space. Machine learning algorithms can effectively search large architectural design spaces.\n\nApplying ML to hardware design automation holds enormous promise to make the process faster, cheaper, and more efficient. It opens up design possibilities that would require more than manual design. The use of ML in hardware design is an area of active research and early deployment, and we will study the techniques involved and their transformative potential.\n\n10.9.1 ML for Hardware Design Automation\nA major opportunity for machine learning in hardware design is automating parts of the complex and tedious design workflow. Hardware design automation (HDA) broadly refers to using ML techniques like reinforcement learning, genetic algorithms, and neural networks to automate tasks like synthesis, verification, floorplanning, and more. Here are a few examples of where ML for HDA shows real promise:\n\nAutomated circuit synthesis: Circuit synthesis involves converting a high-level description of desired logic into an optimized gate-level netlist implementation. This complex process has many design considerations and tradeoffs. ML agents can be trained through reinforcement learning (Qian et al. (2024),G. Zhou and Anderson (2023)) to explore the design space and automatically output optimized syntheses. Startups like Symbiotic EDA are bringing this technology to market.\nAutomated chip floorplanning: Floorplanning refers to strategically placing different components on a chip die area. Search algorithms like genetic algorithms (Valenzuela and Wang (2000)) and reinforcement learning (Mirhoseini et al. (2021), Agnesina et al. (2023)) can be used to automate floorplan optimization to minimize wire length, power consumption, and other objectives. These automated ML-assisted floor planners are extremely valuable as chip complexity increases.\nML hardware simulators: Training deep neural network models to predict how hardware designs will perform as simulators can accelerate the simulation process by over 100x compared to traditional architectural and RTL simulations.\nAutomated code translation: Converting hardware description languages like Verilog to optimized RTL implementations is critical but time-consuming. ML models can be trained to act as translator agents and automate this process.\n\n\nQian, Yu, Xuegong Zhou, Hao Zhou, and Lingli Wang. 2024. “An Efficient Reinforcement Learning Based Framework for Exploring Logic Synthesis.” ACM Trans. Des. Autom. Electron. Syst. 29 (2): 1–33. https://doi.org/10.1145/3632174.\n\nZhou, Guanglei, and Jason H. Anderson. 2023. “Area-Driven FPGA Logic Synthesis Using Reinforcement Learning.” In Proceedings of the 28th Asia and South Pacific Design Automation Conference, 159–65. ACM. https://doi.org/10.1145/3566097.3567894.\n\nValenzuela, Christine L, and Pearl Y Wang. 2000. “A Genetic Algorithm for VLSI Floorplanning.” In Parallel Problem Solving from Nature PPSN VI: 6th International Conference Paris, France, September 1820, 2000 Proceedings 6, 671–80. Springer.\n\nMirhoseini, Azalia, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, et al. 2021. “A Graph Placement Methodology for Fast Chip Design.” Nature 594 (7862): 207–12. https://doi.org/10.1038/s41586-021-03544-w.\n\nAgnesina, Anthony, Puranjay Rajvanshi, Tian Yang, Geraldo Pradipta, Austin Jiao, Ben Keller, Brucek Khailany, and Haoxing Ren. 2023. “AutoDMP: Automated DREAMPlace-Based Macro Placement.” In Proceedings of the 2023 International Symposium on Physical Design, 149–57. ACM. https://doi.org/10.1145/3569052.3578923.\nThe benefits of HDA using ML are reduced design time, superior optimizations, and exploration of design spaces too complex for manual approaches. This can accelerate hardware development and lead to better designs.\nChallenges include limits of ML generalization, the black-box nature of some techniques, and accuracy tradeoffs. However, research is rapidly advancing to address these issues and make HDA ML solutions robust and reliable for production use. HDA provides a major avenue for ML to transform hardware design.\n\n\n10.9.2 ML-Based Hardware Simulation and Verification\nSimulating and verifying hardware designs is critical before manufacturing to ensure the design behaves as intended. Traditional approaches like register-transfer level (RTL) simulation are complex and time-consuming. ML introduces new opportunities to enhance hardware simulation and verification. Some examples include:\n\nSurrogate modeling for simulation: Highly accurate surrogate models of a design can be built using neural networks. These models predict outputs from inputs much faster than RTL simulation, enabling fast design space exploration. Companies like Ansys use this technique.\nML simulators: Large neural network models can be trained on RTL simulations to learn to mimic the functionality of a hardware design. Once trained, the NN model can be a highly efficient simulator for regression testing and other tasks. Graphcore has demonstrated over 100x speedup with this approach.\nFormal verification using ML: Formal verification mathematically proves properties about a design. ML techniques can help generate verification properties and learn to solve the complex formal proofs needed, automating parts of this challenging process. Startups like Cortical.io are bringing formal ML verification solutions to the market.\nBug detection: ML models can be trained to process hardware designs and identify potential issues. This assists human designers in inspecting complex designs and finding bugs. Facebook has shown bug detection models for their server hardware.\n\nThe key benefits of applying ML to simulation and verification are faster design validation turnaround times, more rigorous testing, and reduced human effort. Challenges include verifying ML model correctness and handling corner cases. ML promises to accelerate testing workflows significantly.\n\n\n10.9.3 ML for Efficient Hardware Architectures\nA key goal is designing hardware architectures optimized for performance, power, and efficiency. ML introduces new techniques to automate and enhance architecture design space exploration for general-purpose and specialized hardware like ML accelerators. Some promising examples include:\n\nArchitecture search for hardware: Search techniques like evolutionary algorithms (Kao and Krishna (2020)), Bayesian optimization (Reagen et al. (2017), Bhardwaj et al. (2020)), reinforcement learning (Kao, Jeong, and Krishna (2020), Krishnan et al. (2022)) can automatically generate novel hardware architectures by mutating and mixing design attributes like cache size, number of parallel units, memory bandwidth, and so on. This allows for efficient navigation of large design spaces.\nPredictive modeling for optimization: - ML models can be trained to predict hardware performance, power, and efficiency metrics for a given architecture. These become “surrogate models” (Krishnan et al. (2023)) for fast optimization and space exploration by substituting lengthy simulations.\nSpecialized accelerator optimization: - For specialized chips like tensor processing units for AI, automated architecture search techniques based on ML algorithms (D. Zhang et al. (2022)) show promise for finding fast, efficient designs.\n\n\nKao, Sheng-Chun, and Tushar Krishna. 2020. “Gamma: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm.” In Proceedings of the 39th International Conference on Computer-Aided Design, 1–9. ACM. https://doi.org/10.1145/3400302.3415639.\n\nReagen, Brandon, Jose Miguel Hernandez-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. “A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization.” In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 1–6. IEEE; IEEE. https://doi.org/10.1109/islped.2017.8009208.\n\nBhardwaj, Kshitij, Marton Havasi, Yuan Yao, David M. Brooks, José Miguel Hernández-Lobato, and Gu-Yeon Wei. 2020. “A Comprehensive Methodology to Determine Optimal Coherence Interfaces for Many-Accelerator SoCs.” In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, 145–50. ACM. https://doi.org/10.1145/3370748.3406564.\n\nKao, Sheng-Chun, Geonhwa Jeong, and Tushar Krishna. 2020. “ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators Using Reinforcement Learning.” In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 622–36. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00058.\n\nKrishnan, Srivatsan, Natasha Jaques, Shayegan Omidshafiei, Dan Zhang, Izzeddin Gur, Vijay Janapa Reddi, and Aleksandra Faust. 2022. “Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration.” https://arxiv.org/abs/2211.16385.\n\nZhang, Dan, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna Goldie, and Azalia Mirhoseini. 2022. “A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators.” In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 27–42. ASPLOS ’22. New York, NY, USA: ACM. https://doi.org/10.1145/3503222.3507767.\nThe benefits of using ML include superior design space exploration, automated optimization, and reduced manual effort. Challenges include long training times for some techniques and local optima limitations. However, ML for hardware architecture holds great potential for unlocking performance and efficiency gains.\n\n\n10.9.4 ML to Optimize Manufacturing and Reduce Defects\nOnce a hardware design is complete, it moves to manufacturing. However, variability and defects during manufacturing can impact yields and quality. ML techniques are now being applied to improve fabrication processes and reduce defects. Some examples include:\n\nPredictive maintenance: ML models can analyze equipment sensor data over time and identify signals that predict maintenance needs before failure. This enables proactive upkeep, which can be very handy in the costly fabrication process.\nProcess optimization: Supervised learning models can be trained on process data to identify factors that lead to low yields. The models can then optimize parameters to improve yields, throughput, or consistency.\nYield prediction: By analyzing test data from fabricated designs using techniques like regression trees, ML models can predict yields early in production, allowing process adjustments.\nDefect detection: Computer vision ML techniques can be applied to images of designs to identify defects invisible to the human eye. This enables precision quality control and root cause analysis.\nProactive failure analysis: - ML models can help predict, diagnose, and prevent issues that lead to downstream defects and failures by analyzing structured and unstructured process data.\n\nApplying ML to manufacturing enables process optimization, real-time quality control, predictive maintenance, and higher yields. Challenges include managing complex manufacturing data and variations. But ML is poised to transform semiconductor manufacturing.\n\n\n10.9.5 Toward Foundation Models for Hardware Design\nAs we have seen, machine learning is opening up new possibilities across the hardware design workflow, from specification to manufacturing. However, current ML techniques are still narrow in scope and require extensive domain-specific engineering. The long-term vision is the development of general artificial intelligence systems that can be applied with versatility across hardware design tasks.\nTo fully realize this vision, investment, and research are needed to develop foundation models for hardware design. These are unified, general-purpose ML models and architectures that can learn complex hardware design skills with the right training data and objectives.\nRealizing foundation models for end-to-end hardware design will require the following:\n\nAccumulate large, high-quality, labeled datasets across hardware design stages to train foundation models.\nAdvances in multi-modal, multi-task ML techniques to handle the diversity of hardware design data and tasks.\nInterfaces and abstraction layers to connect foundation models to existing design flows and tools.\nDevelopment of simulation environments and benchmarks to train and test foundation models on hardware design capabilities.\nMethods to explain and interpret ML models’ design decisions and optimizations for trust and verification.\nCompilation techniques to optimize foundation models for efficient deployment across hardware platforms.\n\nWhile significant research remains, foundation models represent the most transformative long-term goal for imbuing AI into the hardware design process. Democratizing hardware design via versatile, automated ML systems promises to unlock a new era of optimized, efficient, and innovative chip design. The journey ahead is filled with open challenges and opportunities.\nIf you are interested in ML-aided computer architecture design (Krishnan et al. 2023), we encourage you to read Architecture 2.0.\n\nKrishnan, Srivatsan, Amir Yazdanbakhsh, Shvetank Prakash, Jason Jabbour, Ikechukwu Uchendu, Susobhan Ghosh, Behzad Boroujerdian, et al. 2023. “ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design.” In Proceedings of the 50th Annual International Symposium on Computer Architecture, 1–16. ACM. https://doi.org/10.1145/3579371.3589049.\nAlternatively, you can watch the below video.", + "text": "10.9 Future Trends\nIn this chapter, the primary focus has been on designing specialized hardware optimized for machine learning workloads and algorithms. This discussion encompassed the tailored architectures of GPUs and TPUs for neural network training and inference. However, an emerging research direction is leveraging machine learning to facilitate the hardware design process itself.\nThe hardware design process involves many complex stages, including specification, high-level modeling, simulation, synthesis, verification, prototyping, and fabrication. Much of this process traditionally requires extensive human expertise, effort, and time. However, recent advances in machine learning are enabling parts of the hardware design workflow to be automated and enhanced using ML techniques.\nSome examples of how ML is transforming hardware design include:\n\nAutomated circuit synthesis using reinforcement learning: Rather than hand-crafting transistor-level designs, ML agents such as reinforcement learning can learn to connect logic gates and generate circuit layouts automatically. This can accelerate the time-consuming synthesis process.\nML-based hardware simulation and emulation: Deep neural network models can be trained to predict how a hardware design will perform under different conditions. For instance, deep learning models can be trained to predict cycle counts for given workloads. This allows faster and more accurate simulation than traditional RTL simulations.\nAutomated chip floorplanning using ML algorithms: Chip floorplanning involves optimally placing different components on a die. Evolutionary algorithms like genetic algorithms and other ML algorithms like reinforcement learning are used to explore floorplan options. This can significantly improve manual floorplanning placements in terms of faster turnaround time and quality of placements.\nML-driven architecture optimization: Novel hardware architectures, like those for efficient ML accelerators, can be automatically generated and optimized by searching the architectural design space. Machine learning algorithms can effectively search large architectural design spaces.\n\nApplying ML to hardware design automation holds enormous promise to make the process faster, cheaper, and more efficient. It opens up design possibilities that would require more than manual design. The use of ML in hardware design is an area of active research and early deployment, and we will study the techniques involved and their transformative potential.\n\n10.9.1 ML for Hardware Design Automation\nA major opportunity for machine learning in hardware design is automating parts of the complex and tedious design workflow. Hardware design automation (HDA) broadly refers to using ML techniques like reinforcement learning, genetic algorithms, and neural networks to automate tasks like synthesis, verification, floorplanning, and more. Here are a few examples of where ML for HDA shows real promise:\n\nAutomated circuit synthesis: Circuit synthesis involves converting a high-level description of desired logic into an optimized gate-level netlist implementation. This complex process has many design considerations and tradeoffs. ML agents can be trained through reinforcement learning (Qian et al. (2024),G. Zhou and Anderson (2023)) to explore the design space and automatically output optimized syntheses. Startups like Symbiotic EDA are bringing this technology to market.\nAutomated chip floorplanning: Floorplanning refers to strategically placing different components on a chip die area. Search algorithms like genetic algorithms (Valenzuela and Wang (2000)) and reinforcement learning (Mirhoseini et al. (2021), Agnesina et al. (2023)) can be used to automate floorplan optimization to minimize wire length, power consumption, and other objectives. These automated ML-assisted floor planners are extremely valuable as chip complexity increases.\nML hardware simulators: Training deep neural network models to predict how hardware designs will perform as simulators can accelerate the simulation process by over 100x compared to traditional architectural and RTL simulations.\nAutomated code translation: Converting hardware description languages like Verilog to optimized RTL implementations is critical but time-consuming. ML models can be trained to act as translator agents and automate this process.\n\n\nQian, Yu, Xuegong Zhou, Hao Zhou, and Lingli Wang. 2024. “An Efficient Reinforcement Learning Based Framework for Exploring Logic Synthesis.” ACM Trans. Des. Autom. Electron. Syst. 29 (2): 1–33. https://doi.org/10.1145/3632174.\n\nZhou, Guanglei, and Jason H. Anderson. 2023. “Area-Driven FPGA Logic Synthesis Using Reinforcement Learning.” In Proceedings of the 28th Asia and South Pacific Design Automation Conference, 159–65. ACM. https://doi.org/10.1145/3566097.3567894.\n\nValenzuela, Christine L, and Pearl Y Wang. 2000. “A Genetic Algorithm for VLSI Floorplanning.” In Parallel Problem Solving from Nature PPSN VI: 6th International Conference Paris, France, September 1820, 2000 Proceedings 6, 671–80. Springer.\n\nMirhoseini, Azalia, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, et al. 2021. “A Graph Placement Methodology for Fast Chip Design.” Nature 594 (7862): 207–12. https://doi.org/10.1038/s41586-021-03544-w.\n\nAgnesina, Anthony, Puranjay Rajvanshi, Tian Yang, Geraldo Pradipta, Austin Jiao, Ben Keller, Brucek Khailany, and Haoxing Ren. 2023. “AutoDMP: Automated DREAMPlace-Based Macro Placement.” In Proceedings of the 2023 International Symposium on Physical Design, 149–57. ACM. https://doi.org/10.1145/3569052.3578923.\nThe benefits of HDA using ML are reduced design time, superior optimizations, and exploration of design spaces too complex for manual approaches. This can accelerate hardware development and lead to better designs.\nChallenges include limits of ML generalization, the black-box nature of some techniques, and accuracy tradeoffs. However, research is rapidly advancing to address these issues and make HDA ML solutions robust and reliable for production use. HDA provides a major avenue for ML to transform hardware design.\n\n\n10.9.2 ML-Based Hardware Simulation and Verification\nSimulating and verifying hardware designs is critical before manufacturing to ensure the design behaves as intended. Traditional approaches like register-transfer level (RTL) simulation are complex and time-consuming. ML introduces new opportunities to enhance hardware simulation and verification. Some examples include:\n\nSurrogate modeling for simulation: Highly accurate surrogate models of a design can be built using neural networks. These models predict outputs from inputs much faster than RTL simulation, enabling fast design space exploration. Companies like Ansys use this technique.\nML simulators: Large neural network models can be trained on RTL simulations to learn to mimic the functionality of a hardware design. Once trained, the NN model can be a highly efficient simulator for regression testing and other tasks. Graphcore has demonstrated over 100x speedup with this approach.\nFormal verification using ML: Formal verification mathematically proves properties about a design. ML techniques can help generate verification properties and learn to solve the complex formal proofs needed, automating parts of this challenging process. Startups like Cortical.io are bringing formal ML verification solutions to the market.\nBug detection: ML models can be trained to process hardware designs and identify potential issues. This assists human designers in inspecting complex designs and finding bugs. Facebook has shown bug detection models for their server hardware.\n\nThe key benefits of applying ML to simulation and verification are faster design validation turnaround times, more rigorous testing, and reduced human effort. Challenges include verifying ML model correctness and handling corner cases. ML promises to accelerate testing workflows significantly.\n\n\n10.9.3 ML for Efficient Hardware Architectures\nA key goal is designing hardware architectures optimized for performance, power, and efficiency. ML introduces new techniques to automate and enhance architecture design space exploration for general-purpose and specialized hardware like ML accelerators. Some promising examples include:\n\nArchitecture search for hardware: Search techniques like evolutionary algorithms (Kao and Krishna (2020)), Bayesian optimization (Reagen et al. (2017), Bhardwaj et al. (2020)), reinforcement learning (Kao, Jeong, and Krishna (2020), Krishnan et al. (2022)) can automatically generate novel hardware architectures by mutating and mixing design attributes like cache size, number of parallel units, memory bandwidth, and so on. This allows for efficient navigation of large design spaces.\nPredictive modeling for optimization: - ML models can be trained to predict hardware performance, power, and efficiency metrics for a given architecture. These become “surrogate models” (Krishnan et al. (2023)) for fast optimization and space exploration by substituting lengthy simulations.\nSpecialized accelerator optimization: - For specialized chips like tensor processing units for AI, automated architecture search techniques based on ML algorithms (D. Zhang et al. (2022)) show promise for finding fast, efficient designs.\n\n\nKao, Sheng-Chun, and Tushar Krishna. 2020. “Gamma: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm.” In Proceedings of the 39th International Conference on Computer-Aided Design, 1–9. ACM. https://doi.org/10.1145/3400302.3415639.\n\nReagen, Brandon, Jose Miguel Hernandez-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. “A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization.” In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 1–6. IEEE; IEEE. https://doi.org/10.1109/islped.2017.8009208.\n\nBhardwaj, Kshitij, Marton Havasi, Yuan Yao, David M. Brooks, José Miguel Hernández-Lobato, and Gu-Yeon Wei. 2020. “A Comprehensive Methodology to Determine Optimal Coherence Interfaces for Many-Accelerator SoCs.” In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, 145–50. ACM. https://doi.org/10.1145/3370748.3406564.\n\nKao, Sheng-Chun, Geonhwa Jeong, and Tushar Krishna. 2020. “ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators Using Reinforcement Learning.” In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 622–36. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00058.\n\nKrishnan, Srivatsan, Natasha Jaques, Shayegan Omidshafiei, Dan Zhang, Izzeddin Gur, Vijay Janapa Reddi, and Aleksandra Faust. 2022. “Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration.” https://arxiv.org/abs/2211.16385.\n\nZhang, Dan, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna Goldie, and Azalia Mirhoseini. 2022. “A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators.” In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 27–42. ASPLOS ’22. New York, NY, USA: ACM. https://doi.org/10.1145/3503222.3507767.\nThe benefits of using ML include superior design space exploration, automated optimization, and reduced manual effort. Challenges include long training times for some techniques and local optima limitations. However, ML for hardware architecture holds great potential for unlocking performance and efficiency gains.\n\n\n10.9.4 ML to Optimize Manufacturing and Reduce Defects\nOnce a hardware design is complete, it moves to manufacturing. However, variability and defects during manufacturing can impact yields and quality. ML techniques are now being applied to improve fabrication processes and reduce defects. Some examples include:\n\nPredictive maintenance: ML models can analyze equipment sensor data over time and identify signals that predict maintenance needs before failure. This enables proactive upkeep, which can be very handy in the costly fabrication process.\nProcess optimization: Supervised learning models can be trained on process data to identify factors that lead to low yields. The models can then optimize parameters to improve yields, throughput, or consistency.\nYield prediction: By analyzing test data from fabricated designs using techniques like regression trees, ML models can predict yields early in production, allowing process adjustments.\nDefect detection: Computer vision ML techniques can be applied to images of designs to identify defects invisible to the human eye. This enables precision quality control and root cause analysis.\nProactive failure analysis: - ML models can help predict, diagnose, and prevent issues that lead to downstream defects and failures by analyzing structured and unstructured process data.\n\nApplying ML to manufacturing enables process optimization, real-time quality control, predictive maintenance, and higher yields. Challenges include managing complex manufacturing data and variations. But ML is poised to transform semiconductor manufacturing.\n\n\n10.9.5 Toward Foundation Models for Hardware Design\nAs we have seen, machine learning is opening up new possibilities across the hardware design workflow, from specification to manufacturing. However, current ML techniques are still narrow in scope and require extensive domain-specific engineering. The long-term vision is the development of general artificial intelligence systems that can be applied with versatility across hardware design tasks.\nTo fully realize this vision, investment, and research are needed to develop foundation models for hardware design. These are unified, general-purpose ML models and architectures that can learn complex hardware design skills with the right training data and objectives.\nRealizing foundation models for end-to-end hardware design will require the following:\n\nAccumulate large, high-quality, labeled datasets across hardware design stages to train foundation models.\nAdvances in multi-modal, multi-task ML techniques to handle the diversity of hardware design data and tasks.\nInterfaces and abstraction layers to connect foundation models to existing design flows and tools.\nDevelopment of simulation environments and benchmarks to train and test foundation models on hardware design capabilities.\nMethods to explain and interpret ML models’ design decisions and optimizations for trust and verification.\nCompilation techniques to optimize foundation models for efficient deployment across hardware platforms.\n\nWhile significant research remains, foundation models represent the most transformative long-term goal for imbuing AI into the hardware design process. Democratizing hardware design via versatile, automated ML systems promises to unlock a new era of optimized, efficient, and innovative chip design. The journey ahead is filled with open challenges and opportunities.\nIf you are interested in ML-aided computer architecture design (Krishnan et al. 2023), we encourage you to read Architecture 2.0.\n\nKrishnan, Srivatsan, Amir Yazdanbakhsh, Shvetank Prakash, Jason Jabbour, Ikechukwu Uchendu, Susobhan Ghosh, Behzad Boroujerdian, et al. 2023. “ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design.” In Proceedings of the 50th Annual International Symposium on Computer Architecture, 1–16. ACM. https://doi.org/10.1145/3579371.3589049.\nAlternatively, you can watch Video 10.3 for more details.\n\n\n\n\n\n\nVideo 10.3: Architecture 2.0", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1111,7 +1111,7 @@ "section": "10.10 Conclusion", "text": "10.10 Conclusion\nSpecialized hardware acceleration has become indispensable for enabling performant and efficient artificial intelligence applications as models and datasets explode in complexity. This chapter examined the limitations of general-purpose processors like CPUs for AI workloads. Their lack of parallelism and computational throughput cannot train or run state-of-the-art deep neural networks quickly. These motivations have driven innovations in customized accelerators.\nWe surveyed GPUs, TPUs, FPGAs, and ASICs specifically designed for the math-intensive operations inherent to neural networks. By covering this spectrum of options, we aimed to provide a framework for reasoning through accelerator selection based on constraints around flexibility, performance, power, cost, and other factors.\nWe also explored the role of software in actively enabling and optimizing AI acceleration. This spans programming abstractions, frameworks, compilers, and simulators. We discussed hardware-software co-design as a proactive methodology for building more holistic AI systems by closely integrating algorithm innovation and hardware advances.\nBut there is so much more to come! Exciting frontiers like analog computing, optical neural networks, and quantum machine learning represent active research directions that could unlock orders of magnitude improvements in efficiency, speed, and scale compared to present paradigms.\nUltimately, specialized hardware acceleration remains indispensable for unlocking the performance and efficiency necessary to fulfill the promise of artificial intelligence from cloud to edge. We hope this chapter provides useful background and insights into the rapid innovation occurring in this domain.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1120,9 +1120,9 @@ "href": "contents/hw_acceleration/hw_acceleration.html#sec-ai-acceleration-resource", "title": "10  AI Acceleration", "section": "10.11 Resources", - "text": "10.11 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\n\nExercise 10.1\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\n\nComing soon.", + "text": "10.11 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 10.1\nVideo 10.2\nVideo 10.3\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\n\nExercise 10.1\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\n\nComing soon.", "crumbs": [ - "MAIN", + "Training", "10  AI Acceleration" ] }, @@ -1133,7 +1133,7 @@ "section": "", "text": "11.1 Introduction\nBenchmarking provides the essential measurements needed to drive machine learning progress and truly understand system performance. As the physicist Lord Kelvin famously said, “To measure is to know.” Benchmarks allow us to quantitatively know the capabilities of different models, software, and hardware. They allow ML developers to measure the inference time, memory usage, power consumption, and other metrics that characterize a system. Moreover, benchmarks create standardized processes for measurement, enabling fair comparisons across different solutions.\nWhen benchmarks are maintained over time, they become instrumental in capturing progress across generations of algorithms, datasets, and hardware. The models and techniques that set new records on ML benchmarks from one year to the next demonstrate tangible improvements in what’s possible for on-device machine learning. By using benchmarks to measure, ML practitioners can know the real-world capabilities of their systems and have confidence that each step reflects genuine progress towards the state-of-the-art.\nBenchmarking has several important goals and objectives that guide its implementation for machine learning systems.\nThis chapter will cover the 3 types of AI benchmarks, the standard metrics, tools, and techniques designers use to optimize their systems, and the challenges and trends in benchmarking.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1144,7 +1144,7 @@ "section": "", "text": "Performance assessment. This involves evaluating key metrics like a given model’s speed, accuracy, and efficiency. For instance, in a TinyML context, it is crucial to benchmark how quickly a voice assistant can recognize commands, as this evaluates real-time performance.\nResource evaluation. This means assessing the model’s impact on critical system resources, including battery life, memory usage, and computational overhead. A relevant example is comparing the battery drain of two different image recognition algorithms running on a wearable device.\nValidation and verification. Benchmarking helps ensure the system functions correctly and meets specified requirements. One way is by checking the accuracy of an algorithm, like a heart rate monitor on a smartwatch, against readings from medical-grade equipment as a form of clinical validation.\nCompetitive analysis. This enables comparing solutions against competing offerings in the market. For example, benchmarking a custom object detection model versus common TinyML benchmarks like MobileNet and Tiny-YOLO.\nCredibility. Accurate benchmarks uphold the credibility of AI solutions and the organizations that develop them. They demonstrate a commitment to transparency, honesty, and quality, which are essential in building trust with users and stakeholders.\nRegulation and Standardization. As the AI industry continues to grow, there is an increasing need for regulation and standardization to ensure that AI solutions are safe, ethical, and effective. Accurate and reliable benchmarks are essential to this regulatory framework, as they provide the data and evidence needed to assess compliance with industry standards and legal requirements.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1155,7 +1155,7 @@ "section": "11.2 Historical Context", "text": "11.2 Historical Context\n\n11.2.1 Standard Benchmarks\nThe evolution of benchmarks in computing vividly illustrates the industry’s relentless pursuit of excellence and innovation. In the early days of computing during the 1960s and 1970s, benchmarks were rudimentary and designed for mainframe computers. For example, the Whetstone benchmark, named after the Whetstone ALGOL compiler, was one of the first standardized tests to measure the floating-point arithmetic performance of a CPU. These pioneering benchmarks prompted manufacturers to refine their architectures and algorithms to achieve better benchmark scores.\nThe 1980s marked a significant shift with the rise of personal computers. As companies like IBM, Apple, and Commodore competed for market share, and so benchmarks became critical tools to enable fair competition. The SPEC CPU benchmarks, introduced by the System Performance Evaluation Cooperative (SPEC), established standardized tests allowing objective comparisons between different machines. This standardization created a competitive environment, pushing silicon manufacturers and system creators to continually enhance their hardware and software offerings.\nThe 1990s brought the era of graphics-intensive applications and video games. The need for benchmarks to evaluate graphics card performance led to Futuremark’s creation of 3DMark. As gamers and professionals sought high-performance graphics cards, companies like NVIDIA and AMD were driven to rapid innovation, leading to major advancements in GPU technology like programmable shaders.\nThe 2000s saw a surge in mobile phones and portable devices like tablets. With portability came the challenge of balancing performance and power consumption. Benchmarks like MobileMark by BAPCo evaluated speed and battery life. This drove companies to develop more energy-efficient System-on-Chips (SOCs), leading to the emergence of architectures like ARM that prioritized power efficiency.\nThe focus of the recent decade has shifted towards cloud computing, big data, and artificial intelligence. Cloud service providers like Amazon Web Services and Google Cloud compete on performance, scalability, and cost-effectiveness. Tailored cloud benchmarks like CloudSuite have become essential, driving providers to optimize their infrastructure for better services.\n\n\n11.2.2 Custom Benchmarks\nIn addition to industry-standard benchmarks, there are custom benchmarks specifically designed to meet the unique requirements of a particular application or task. They are tailored to the specific needs of the user or developer, ensuring that the performance metrics are directly relevant to the intended use of the AI model or system. Custom benchmarks can be created by individual organizations, researchers, or developers and are often used in conjunction with industry-standard benchmarks to provide a comprehensive evaluation of AI performance.\nFor example, a hospital could develop a benchmark to assess an AI model for predicting patient readmission. This benchmark would incorporate metrics relevant to the hospital’s patient population, like demographics, medical history, and social factors. Similarly, a financial institution’s fraud detection benchmark could focus on identifying fraudulent transactions accurately while minimizing false positives. In automotive, an autonomous vehicle benchmark may prioritize performance in diverse conditions, responding to obstacles, and safety. Retailers could benchmark recommendation systems using click-through rate, conversion rate, and customer satisfaction. Manufacturing companies might benchmark quality control systems on defect identification, efficiency, and waste reduction. In each industry, custom benchmarks provide organizations with evaluation criteria tailored to their unique needs and context. This allows for a more meaningful assessment of how well AI systems meet requirements.\nThe advantage of custom benchmarks lies in their flexibility and relevance. They can be designed to test specific performance aspects critical to the success of the AI solution in its intended application. This allows for a more targeted and accurate assessment of the AI model or system’s capabilities. Custom benchmarks also provide valuable insights into the performance of AI solutions in real-world scenarios, which can be crucial for identifying potential issues and areas for improvement.\nIn AI, benchmarks play a crucial role in driving progress and innovation. While benchmarks have long been used in computing, their application to machine learning is relatively recent. AI-focused benchmarks provide standardized metrics to evaluate and compare the performance of different algorithms, model architectures, and hardware platforms.\n\n\n11.2.3 Community Consensus\nA key prerogative for any benchmark to be impactful is that it must reflect the shared priorities and values of the broader research community. Benchmarks designed in isolation risk failing to gain acceptance if they overlook key metrics considered important by leading groups. Through collaborative development with open participation from academic labs, companies, and other stakeholders, benchmarks can incorporate collective input on critical capabilities worth measuring. This helps ensure the benchmarks evaluate aspects the community agrees are essential to advance the field. The process of reaching alignment on tasks and metrics itself supports converging on what matters most.\nFurthermore, benchmarks published with broad co-authorship from respected institutions carry authority and validity that convinces the community to adopt them as trusted standards. Benchmarks perceived as biased by particular corporate or institutional interests breed skepticism. Ongoing community engagement through workshops and challenges is also key after the initial release, and that is what, for instance, led to the success of ImageNet. As research progresses, collective participation enables continual refinement and expansion of benchmarks over time.\nFinally, community-developed benchmarks released with open access accelerate adoption and consistent implementation. We shared open-source code, documentation, models, and infrastructure to lower barriers for groups to benchmark solutions on an equal footing using standardized implementations. This consistency is critical for fair comparisons. Without coordination, labs and companies may implement benchmarks differently, reducing result reproducibility.\nCommunity consensus brings benchmarks lasting relevance, while fragmentation confuses. Through collaborative development and transparent operation, benchmarks can become authoritative standards for tracking progress. Several of the benchmarks that we discuss in this chapter were developed and built by the community, for the community, and that is what ultimately led to their success.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1166,7 +1166,7 @@ "section": "11.3 AI Benchmarks: System, Model, and Data", "text": "11.3 AI Benchmarks: System, Model, and Data\nThe need for comprehensive benchmarking becomes paramount as AI systems grow in complexity and ubiquity. Within this context, benchmarks are often classified into three primary categories: Hardware, Model, and Data. Let’s delve into why each of these buckets is essential and the significance of evaluating AI from these three distinct dimensions:\n\n11.3.1 System Benchmarks\nAI computations, especially those in deep learning, are resource-intensive. The hardware on which these computations run plays an important role in determining AI solutions’ speed, efficiency, and scalability. Consequently, hardware benchmarks help evaluate the performance of CPUs, GPUs, TPUs, and other accelerators in AI tasks. By understanding hardware performance, developers can choose which hardware platforms best suit specific AI applications. Furthermore, hardware manufacturers use these benchmarks to identify areas for improvement, driving innovation in AI-specific chip designs.\n\n\n11.3.2 Model Benchmarks\nThe architecture, size, and complexity of AI models vary widely. Different models have different computational demands and offer varying levels of accuracy and efficiency. Model benchmarks help us assess the performance of various AI architectures on standardized tasks. They provide insights into different models’ speed, accuracy, and resource demands. By benchmarking models, researchers can identify best-performing architectures for specific tasks, guiding the AI community towards more efficient and effective solutions. Additionally, these benchmarks aid in tracking the progress of AI research, showcasing advancements in model design and optimization.\n\n\n11.3.3 Data Benchmarks\nAI, particularly machine learning, is inherently data-driven. The quality, size, and diversity of data influence AI models’ training efficacy and generalization capability. Data benchmarks focus on the datasets used in AI training and evaluation. They provide standardized datasets the community can use to train and test models, ensuring a level playing field for comparisons. Moreover, these benchmarks highlight data quality, diversity, and representation challenges, pushing the community to address biases and gaps in AI training data. By understanding data benchmarks, researchers can also gauge how models might perform in real-world scenarios, ensuring robustness and reliability.\nIn the remainder of the sections, we will discuss each of these benchmark types. The focus will be an in-depth exploration of system benchmarks, as these are critical to understanding and advancing machine learning system performance. We will briefly cover model and data benchmarks for a comprehensive perspective, but the emphasis and majority of the content will be devoted to system benchmarks.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1175,9 +1175,9 @@ "href": "contents/benchmarking/benchmarking.html#system-benchmarking", "title": "11  Benchmarking AI", "section": "11.4 System Benchmarking", - "text": "11.4 System Benchmarking\n\n11.4.1 Granularity\nMachine learning system benchmarking provides a structured and systematic approach to assessing a system’s performance across various dimensions. Given the complexity of ML systems, we can dissect their performance through different levels of granularity and obtain a comprehensive view of the system’s efficiency, identify potential bottlenecks, and pinpoint areas for improvement. To this end, various types of benchmarks have evolved over the years and continue to persist.\nFigure 11.1 illustrates the different layers of granularity of an ML system. At the application level, end-to-end benchmarks assess the overall system performance, considering factors like data preprocessing, model training, and inference. While at the model layer, benchmarks focus on assessing the efficiency and accuracy of specific models. This includes evaluating how well models generalize to new data and their computational efficiency during training and inference. Furthermore, benchmarking can extend to hardware and software infrastructure, examining the performance of individual components like GPUs or TPUs.\n\n\n\n\n\n\nFigure 11.1: ML system granularity.\n\n\n\n\nMicro Benchmarks\nMicro-benchmarks in AI are specialized, evaluating distinct components or specific operations within a broader machine learning process. These benchmarks zero in on individual tasks, offering insights into the computational demands of a particular neural network layer, the efficiency of a unique optimization technique, or the throughput of a specific activation function. For instance, practitioners might use micro-benchmarks to measure the computational time required by a convolutional layer in a deep learning model or to evaluate the speed of data preprocessing that feeds data into the model. Such granular assessments are instrumental in fine-tuning and optimizing discrete aspects of AI models, ensuring that each component operates at its peak potential.\nThese types of microbenchmarks include zooming into very specific operations or components of the AI pipeline, such as the following:\n\nTensor Operations: Libraries like cuDNN (by NVIDIA) often have benchmarks to measure the performance of individual tensor operations, such as convolutions or matrix multiplications, which are foundational to deep learning computations.\nActivation Functions: Benchmarks that measure the speed and efficiency of various activation functions like ReLU, Sigmoid, or Tanh in isolation.\nLayer Benchmarks: Evaluations of the computational efficiency of distinct neural network layers, such as LSTM or Transformer blocks, when operating on standardized input sizes.\n\nExample: DeepBench, introduced by Baidu, is a good example of something that assesses the above. DeepBench assesses the performance of basic operations in deep learning models, providing insights into how different hardware platforms handle neural network training and inference.\n\n\n\n\n\n\nExercise 11.1: System Benchmarking - Tensor Operations\n\n\n\n\n\nEver wonder how your image filters get so fast? Special libraries like cuDNN supercharge those calculations on certain hardware. In this Colab, we’ll use cuDNN with PyTorch to speed up image filtering. Think of it as a tiny benchmark, showing how the right software can unlock your GPU’s power!\n\n\n\n\n\n\nMacro Benchmarks\nMacro benchmarks provide a holistic view, assessing the end-to-end performance of entire machine learning models or comprehensive AI systems. Rather than focusing on individual operations, macro-benchmarks evaluate the collective efficacy of models under real-world scenarios or tasks. For example, a macro-benchmark might assess the complete performance of a deep learning model undertaking image classification on a dataset like ImageNet. This includes gauging accuracy, computational speed, and resource consumption. Similarly, one might measure the cumulative time and resources needed to train a natural language processing model on extensive text corpora or evaluate the performance of an entire recommendation system, from data ingestion to final user-specific outputs.\nExamples: These benchmarks evaluate the AI model:\n\nMLPerf Inference(Reddi et al. (2020)): An industry-standard set of benchmarks for measuring the performance of machine learning software and hardware. MLPerf has a suite of dedicated benchmarks for specific scales, such as MLPerf Mobile for mobile class devices and MLPerf Tiny, which focuses on microcontrollers and other resource-constrained devices.\nEEMBC’s MLMark: A benchmarking suite for evaluating the performance and power efficiency of embedded devices running machine learning workloads. This benchmark provides insights into how different hardware platforms handle tasks like image recognition or audio processing.\nAI-Benchmark(Ignatov et al. (2019)): A benchmarking tool designed for Android devices, it evaluates the performance of AI tasks on mobile devices, encompassing various real-world scenarios like image recognition, face parsing, and optical character recognition.\n\n\nReddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, et al. 2020. “MLPerf Inference Benchmark.” In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 446–59. IEEE; IEEE. https://doi.org/10.1109/isca45697.2020.00045.\n\nIgnatov, Andrey, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. “AI Benchmark: All about Deep Learning on Smartphones in 2019.” In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 0–0. IEEE. https://doi.org/10.1109/iccvw.2019.00447.\n\n\nEnd-to-end Benchmarks\nEnd-to-end benchmarks provide an all-inclusive evaluation that extends beyond the boundaries of the AI model itself. Instead of focusing solely on a machine learning model’s computational efficiency or accuracy, these benchmarks encompass the entire pipeline of an AI system. This includes initial data preprocessing, the core model’s performance, post-processing of the model’s outputs, and other integral components like storage and network interactions.\nData preprocessing is the first stage in many AI systems, transforming raw data into a format suitable for model training or inference. These preprocessing steps’ efficiency, scalability, and accuracy are vital for the overall system’s performance. End-to-end benchmarks assess this phase, ensuring that data cleaning, normalization, augmentation, or any other transformation process doesn’t become a bottleneck.\nThe post-processing phase also takes center stage. This involves interpreting the model’s raw outputs, possibly converting scores into meaningful categories, filtering results, or even integrating with other systems. In real-world applications, this phase is crucial for delivering actionable insights, and end-to-end benchmarks ensure it’s both efficient and effective.\nBeyond the core AI operations, other system components are important in the overall performance and user experience. Storage solutions, whether cloud-based, on-premises, or hybrid, can significantly impact data retrieval and storage times, especially with vast AI datasets. Similarly, network interactions, vital for cloud-based AI solutions or distributed systems, can become performance bottlenecks if not optimized. End-to-end benchmarks holistically evaluate these components, ensuring that the entire system operates seamlessly, from data retrieval to final output delivery.\nTo date, there are no public, end-to-end benchmarks that take into account the role of data storage, network, and compute performance. Arguably, MLPerf Training and Inference come close to the idea of an end-to-end benchmark, but they are exclusively focused on ML model performance and do not represent real-world deployment scenarios of how models are used in the field. Nonetheless, they provide a very useful signal that helps assess AI system performance.\nGiven the inherent specificity of end-to-end benchmarking, it is typically performed internally at a company by instrumenting real production deployments of AI. This allows engineers to have a realistic understanding and breakdown of the performance, but given the sensitivity and specificity of the information, it is rarely reported outside of the company.\n\n\nUnderstanding the Trade-offs\nDifferent issues arise at different stages of an AI system. Micro-benchmarks help fine-tune individual components, macro-benchmarks aid in refining model architectures or algorithms, and end-to-end benchmarks guide the optimization of the entire workflow. By understanding where a problem lies, developers can apply targeted optimizations.\nMoreover, while individual components of an AI system might perform optimally in isolation, bottlenecks can emerge when they interact. End-to-end benchmarks, in particular, are crucial to ensure that the entire system, when operating collectively, meets desired performance and efficiency standards.\nFinally, organizations can make informed decisions on where to allocate resources by discerning performance bottlenecks or inefficiencies. For instance, if micro-benchmarks reveal inefficiencies in specific tensor operations, investments can be directed toward specialized hardware accelerators. Conversely, if end-to-end benchmarks indicate data retrieval issues, investments might be channeled toward better storage solutions.\n\n\n\n11.4.2 Benchmark Components\nAt its core, an AI benchmark is more than just a test or a score; it’s a comprehensive evaluation framework. To understand this in-depth, let’s break down the typical components that go into an AI benchmark.\n\nStandardized Datasets\nDatasets serve as the foundation for most AI benchmarks. They provide a consistent data set on which models are trained and evaluated, ensuring a level playing field for comparisons.\nExample: ImageNet, a large-scale dataset containing millions of labeled images spanning thousands of categories, is a popular benchmarking standard for image classification tasks.\n\n\nPre-defined Tasks\nA benchmark should have a clear objective or task that models aim to achieve. This task defines the problem the AI system is trying to solve.\nExample: Tasks for natural language processing benchmarks might include sentiment analysis, named entity recognition, or machine translation.\n\n\nEvaluation Metrics\nOnce a task is defined, benchmarks require metrics to quantify performance. These metrics offer objective measures to compare different models or systems.\nIn classification tasks, metrics like accuracy, precision, recall, and F1 score are commonly used. Mean squared or absolute errors might be employed for regression tasks.\n\n\nBaseline Models\nBenchmarks often include baseline models or reference implementations. These serve as starting points or minimum performance standards against which new models or techniques can be compared.\nExample: In many benchmark suites, simple models like linear regression or basic neural networks serve as baselines to provide context for more complex model evaluations.\n\n\nHardware and Software Specifications\nGiven the variability introduced by different hardware and software configurations, benchmarks often specify or document the hardware and software environments in which tests are conducted.\nExample: An AI benchmark might note that evaluations were conducted on an NVIDIA Tesla V100 GPU using TensorFlow v2.4.\n\n\nEnvironmental Conditions\nAs external factors can influence benchmark results, it’s essential to either control or document conditions like temperature, power source, or system background processes.\nExample: Mobile AI benchmarks might specify that tests were conducted at room temperature with devices plugged into a power source to eliminate battery-level variances.\n\n\nReproducibility Rules\nTo ensure benchmarks are credible and can be replicated by others in the community, they often include detailed protocols covering everything from random seeds used to exact hyperparameters.\nExample: A benchmark for a reinforcement learning task might detail the exact training episodes, exploration-exploitation ratios, and reward structures used.\n\n\nResult Interpretation Guidelines\nBeyond raw scores or metrics, benchmarks often provide guidelines or context to interpret results, helping practitioners understand the broader implications.\nExample: A benchmark might highlight that while Model A scored higher than Model B in accuracy, it offers better real-time performance, making it more suitable for time-sensitive applications.\n\n\n\n11.4.3 Training vs. Inference\nThe development life cycle of a machine learning model involves two critical phases - training and inference. Training is the process of learning patterns from data to create the model. Inference refers to the model making predictions on new unlabeled data. Both phases play indispensable yet distinct roles. Consequently, each phase warrants rigorous benchmarking to evaluate performance metrics like speed, accuracy, and computational efficiency.\nBenchmarking the training phase provides insights into how different model architectures, hyperparameter values, and optimization algorithms impact the time and resources needed to train the model. For instance, benchmarking shows how neural network depth affects training time on a given dataset. Benchmarking also reveals how hardware accelerators like GPUs and TPUs can speed up training.\nOn the other hand, benchmarking inference evaluates model performance in real-world conditions after deployment. Key metrics include latency, throughput, memory footprint, and power consumption. Inference benchmarking determines if a model meets the requirements of its target application regarding response time and device constraints, which is typically the focus of TinyML. However, we will discuss these broadly to ensure a general understanding.\n\n\n11.4.4 Training Benchmarks\nTraining represents the phase where the system processes and ingests raw data to adjust and refine its parameters. Therefore, it is an algorithmic activity and involves system-level considerations, including data pipelines, storage, computing resources, and orchestration mechanisms. The goal is to ensure that the ML system can efficiently learn from data, optimizing both the model’s performance and the system’s resource utilization.\n\nPurpose\nFrom an ML systems perspective, training benchmarks evaluate how well the system scales with increasing data volumes and computational demands. It’s about understanding the interplay between hardware, software, and the data pipeline in the training process.\nConsider a distributed ML system designed to train on vast datasets, like those used in large-scale e-commerce product recommendations. A training benchmark would assess how efficiently the system scales across multiple nodes, manage data sharding and handle failures or node drop-offs during training.\nTraining benchmarks evaluate CPU, GPU, memory, and network utilization during the training phase, guiding system optimizations. When training a model in a cloud-based ML system, it’s crucial to understand how resources are being utilized. Are GPUs being fully leveraged? Is there unnecessary memory overhead? Benchmarks can highlight bottlenecks or inefficiencies in resource utilization, leading to cost savings and performance improvements.\nTraining an ML model is contingent on timely and efficient data delivery. Benchmarks in this context would also assess the efficiency of data pipelines, data preprocessing speed, and storage retrieval times. For real-time analytics systems, like those used in fraud detection, the speed at which training data is ingested, preprocessed, and fed into the model can be critical. Benchmarks would evaluate the latency of data pipelines, the efficiency of storage systems (like SSDs vs. HDDs), and the speed of data augmentation or transformation tasks.\n\n\nMetrics\nWhen viewed from a systems perspective, training metrics offer insights that transcend conventional algorithmic performance indicators. These metrics measure the model’s learning efficacy and gauge the efficiency, scalability, and robustness of the entire ML system during the training phase. Let’s delve deeper into these metrics and their significance.\nThe following metrics are often considered important:\n\nTraining Time: The time it takes to train a model from scratch until it reaches a satisfactory performance level. It directly measures the computational resources required to train a model. For example, Google’s BERT(Devlin et al. (2019)) is a natural language processing model that requires several days to train on a massive corpus of text data using multiple GPUs. The long training time is a significant resource consumption and cost challenge.\nScalability: How well the training process can handle increases in data size or model complexity. Scalability can be assessed by measuring training time, memory usage, and other resource consumption as data size or model complexity increases. OpenAI’s GPT-3(Brown et al. (2020)) model has 175 billion parameters, making it one of the largest language models in existence. Training GPT-3 required extensive engineering efforts to scale the training process to handle the massive model size. This involved using specialized hardware, distributed training, and other techniques to ensure the model could be trained efficiently.\nResource Utilization: The extent to which the training process utilizes available computational resources such as CPU, GPU, memory, and disk I/O. High resource utilization can indicate an efficient training process, while low utilization can suggest bottlenecks or inefficiencies. For instance, training a convolutional neural network (CNN) for image classification requires significant GPU resources. Utilizing multi-GPU setups and optimizing the training code for GPU acceleration can greatly improve resource utilization and training efficiency.\nMemory Consumption: The amount of memory the training process uses. Memory consumption can be a limiting factor for training large models or datasets. For example, Google researchers faced significant memory consumption challenges when training BERT. The model has hundreds of millions of parameters, requiring large amounts of memory. The researchers had to develop techniques to reduce memory consumption, such as gradient checkpointing and model parallelism.\n** Energy Consumption: ** The energy consumed during training. As machine learning models become more complex, energy consumption has become an important consideration. Training large machine learning models can consume significant energy, leading to a large carbon footprint. For instance, the training of OpenAI’s GPT-3 was estimated to have a carbon footprint equivalent to traveling by car for 700,000 kilometers.\nThroughput: The number of training samples processed per unit time. Higher throughput generally indicates a more efficient training process. The throughput is an important metric to consider when training a recommendation system for an e-commerce platform. A high throughput ensures that the model can process large volumes of user interaction data promptly, which is crucial for maintaining the relevance and accuracy of the recommendations. But it’s also important to understand how to balance throughput with latency bounds. Therefore, a latency-bounded throughput constraint is often imposed on service-level agreements for data center application deployments.\nCost: The cost of training a model can include both computational and human resources. Cost is important when considering the practicality and feasibility of training large or complex models. Training large language models like GPT-3 is estimated to cost millions of dollars. This cost includes computational, electricity and human resources required for model development and training.\nFault Tolerance and Robustness: The ability of the training process to handle failures or errors without crashing or producing incorrect results. This is important for ensuring the reliability of the training process. Network failures or hardware malfunctions can occur in a real-world scenario where a machine-learning model is being trained on a distributed system. In recent years, it has become abundantly clear that faults arising from silent data corruption have emerged as a major issue. A fault-tolerant and robust training process can recover from such failures without compromising the model’s integrity.\nEase of Use and Flexibility: The ease with which the training process can be set up and used and its flexibility in handling different types of data and models. In companies like Google, efficiency can sometimes be measured by the number of Software Engineer (SWE) years saved since that translates directly to impact. Ease of use and flexibility can reduce the time and effort required to train a model. TensorFlow and PyTorch are popular machine-learning frameworks that provide user-friendly interfaces and flexible APIs for building and training machine-learning models. These frameworks support many model architectures and are equipped with tools that simplify the training process.\nReproducibility: The ability to reproduce the training process results. Reproducibility is important for verifying a model’s correctness and validity. However, variations due to stochastic network characteristics often make it hard to reproduce the precise behavior of applications being trained, which can present a challenge for benchmarking.\n\n\nDevlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North, 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423.\nBy benchmarking for these types of metrics, we can obtain a comprehensive view of the training process’s performance and efficiency from a systems perspective. This can help identify areas for improvement and ensure that resources are used effectively.\n\n\nTasks\nSelecting a handful of representative tasks for benchmarking machine learning systems is challenging because machine learning is applied to various domains with unique characteristics and requirements. Here are some of the challenges faced in selecting representative tasks:\n\nDiversity of Applications: Machine learning is used in numerous fields such as healthcare, finance, natural language processing, computer vision, and many more. Each field has specific tasks that may not be representative of other fields. For example, image classification tasks in computer vision may not be relevant to financial fraud detection.\nVariability in Data Types and Quality: Different tasks require different data types, such as text, images, videos, or numerical data. Data quality and availability can vary greatly between tasks, making it difficult to select tasks that are representative of the general challenges faced in machine learning.\nTask Complexity and Difficulty: The complexity of tasks varies greatly. Some are relatively straightforward, while others are highly complex and require sophisticated models and techniques. Selecting representative tasks that cover the complexities encountered in machine learning is challenging.\nEthical and Privacy Concerns: Some tasks may involve sensitive or private data, such as medical records or personal information. These tasks may have ethical and privacy concerns that need to be addressed, making them less suitable as representative tasks for benchmarking.\nScalability and Resource Requirements: Different tasks may have different scalability and resource requirements. Some tasks may require extensive computational resources, while others can be performed with minimal resources. Selecting tasks that represent the general resource requirements in machine learning is difficult.\nEvaluation Metrics: The metrics used to evaluate the performance of machine learning models vary between tasks. Some tasks may have well-established evaluation metrics, while others lack clear or standardized metrics. This can make it challenging to compare performance across different tasks.\nGeneralizability of Results: The results obtained from benchmarking on a specific task may not be generalizable to other tasks. This means that a machine learning system’s performance on a selected task may not be indicative of its performance on other tasks.\n\nIt is important to carefully consider these factors when designing benchmarks to ensure they are meaningful and relevant to the diverse range of tasks encountered in machine learning.\n\n\nBenchmarks\nHere are some original works that laid the fundamental groundwork for developing systematic benchmarks for training machine learning systems.\nMLPerf Training Benchmark\nMLPerf is a suite of benchmarks designed to measure the performance of machine learning hardware, software, and services. The MLPerf Training benchmark (Mattson et al. 2020a) focuses on the time it takes to train models to a target quality metric. It includes diverse workloads, such as image classification, object detection, translation, and reinforcement learning.\nMetrics:\n\nTraining time to target quality\nThroughput (examples per second)\nResource utilization (CPU, GPU, memory, disk I/O)\n\nDAWNBench\nDAWNBench (Coleman et al. 2019) is a benchmark suite focusing on end-to-end deep learning training time and inference performance. It includes common tasks such as image classification and question answering.\n\nColeman, Cody, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2019. “Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.” ACM SIGOPS Operating Systems Review 53 (1): 14–25. https://doi.org/10.1145/3352020.3352024.\nMetrics:\n\nTime to train to target accuracy\nInference latency\nCost (in terms of cloud computing and storage resources)\n\nFathom\nFathom (Adolf et al. 2016) is a benchmark from Harvard University that evaluates the performance of deep learning models using a diverse set of workloads. These include common tasks such as image classification, speech recognition, and language modeling.\n\nAdolf, Robert, Saketh Rama, Brandon Reagen, Gu-yeon Wei, and David Brooks. 2016. “Fathom: Reference Workloads for Modern Deep Learning Methods.” In 2016 IEEE International Symposium on Workload Characterization (IISWC), 1–10. IEEE; IEEE. https://doi.org/10.1109/iiswc.2016.7581275.\nMetrics:\n\nOperations per second (to measure computational efficiency)\nTime to completion for each workload\nMemory bandwidth\n\nExample Use Case\nConsider a scenario where we want to benchmark the training of an image classification model on a specific hardware platform.\n\nTask: The task is to train a convolutional neural network (CNN) for image classification on the CIFAR-10 dataset.\nBenchmark: We can use the MLPerf Training benchmark for this task. It includes an image classification workload that is relevant to our task.\nMetrics: We will measure the following metrics:\n\n\nTraining time to reach a target accuracy of 90%.\nThroughput in terms of images processed per second.\nGPU and CPU utilization during training.\n\nBy measuring these metrics, we can assess the performance and efficiency of the training process on the selected hardware platform. This information can then be used to identify potential bottlenecks or areas for improvement.\n\n\n\n11.4.5 Inference Benchmarks\nInference in machine learning refers to using a trained model to make predictions on new, unseen data. It is the phase where the model applies its learned knowledge to solve the problem it was designed for, such as classifying images, recognizing speech, or translating text.\n\nPurpose\nWhen we build machine learning models, our ultimate goal is to deploy them in real-world applications where they can provide accurate and reliable predictions on new, unseen data. This process of using a trained model to make predictions is known as inference. A machine learning model’s real-world performance can differ significantly from its performance on training or validation datasets, which makes benchmarking inference a crucial step in the development and deployment of machine learning models.\nBenchmarking inference allows us to evaluate how well a machine-learning model performs in real-world scenarios. This evaluation ensures that the model is practical and reliable when deployed in applications, providing a more comprehensive understanding of the model’s behavior with real data. Additionally, benchmarking can help identify potential bottlenecks or limitations in the model’s performance. For example, if a model takes less time to predict, it may be impractical for real-time applications such as autonomous driving or voice assistants.\nResource efficiency is another critical aspect of inference, as it can be computationally intensive and require significant memory and processing power. Benchmarking helps ensure that the model is efficient regarding resource usage, which is particularly important for edge devices with limited computational capabilities, such as smartphones or IoT devices. Moreover, benchmarking allows us to compare the performance of our model with competing models or previous versions of the same model. This comparison is essential for making informed decisions about which model to deploy in a specific application.\nFinally, it is vital to ensure that the model’s predictions are not only accurate but also consistent across different data points. Benchmarking helps verify the model’s accuracy and consistency, ensuring that it meets the application’s requirements. It also assesses the model’s robustness, ensuring that it can handle real-world data variability and still make accurate predictions.\n\n\nMetrics\n\nAccuracy: Accuracy is one of the most vital metrics when benchmarking machine learning models. It quantifies the proportion of correct predictions made by the model compared to the true values or labels. For example, if a spam detection model can correctly classify 95 out of 100 email messages as spam or not, its accuracy would be calculated as 95%.\nLatency: Latency is a performance metric that calculates the time lag or delay between the input receipt and the production of the corresponding output by the machine learning system. An example that clearly depicts latency is a real-time translation application; if a half-second delay exists from the moment a user inputs a sentence to the time the app displays the translated text, then the system’s latency is 0.5 seconds.\nLatency-Bounded Throughput: Latency-bounded throughput is a valuable metric that combines the aspects of latency and throughput, measuring the maximum throughput of a system while still meeting a specified latency constraint. For example, in a video streaming application that utilizes a machine learning model to generate and display subtitles automatically, latency-bounded throughput would measure how many video frames the system can process per second (throughput) while ensuring that the subtitles are displayed with no more than a 1-second delay (latency). This metric is particularly important in real-time applications where meeting latency requirements is crucial to the user experience.\nThroughput: Throughput assesses the system’s capacity by measuring the number of inferences or predictions a machine learning model can handle within a specific unit of time. Consider a speech recognition system that employs a Recurrent Neural Network (RNN) as its underlying model; if this system can process and understand 50 different audio clips in a minute, then its throughput rate stands at 50 clips per minute.\nInference Time: Inference time is a crucial metric that measures the duration a machine learning system, such as a Convolutional Neural Network (CNN) used in image recognition tasks, takes to process an input and generate a prediction or output. For instance, if a CNN takes approximately 2 milliseconds to identify and label a cat within a given photo accurately, then its inference time is said to be 2 milliseconds.\nEnergy Efficiency: Energy efficiency is a metric that determines the amount of energy consumed by the machine learning model to perform a single inference. A prime example of this would be a natural language processing model built on a Transformer network architecture; if it utilizes 0.1 Joules of energy to translate a sentence from English to French, its energy efficiency is measured at 0.1 Joules per inference.\nMemory Usage: Memory usage quantifies the volume of RAM needed by a machine learning model to carry out inference tasks. A relevant example to illustrate this would be a face recognition system based on a CNN; if such a system requires 150 MB of RAM to process and recognize faces within an image, its memory usage is 150 MB.\n\n\n\nTasks\nThe challenges in picking representative tasks for benchmarking inference machine learning systems are, by and large, somewhat similar to the taxonomy we have provided for training. Nevertheless, to be pedantic, let’s discuss those in the context of inference machine learning systems.\n\nDiversity of Applications: Inference machine learning is employed across numerous domains such as healthcare, finance, entertainment, security, and more. Each domain has unique tasks, and what’s representative in one domain might not be in another. For example, an inference task for predicting stock prices in the financial domain might differ from image recognition tasks in the medical domain.\nVariability in Data Types: Different inference tasks require different types of data—text, images, videos, numerical data, etc. Ensuring that benchmarks address the wide variety of data types used in real-world applications is challenging. For example, voice recognition systems process audio data, which is vastly different from the visual data processed by facial recognition systems.\nTask Complexity: The complexity of inference tasks can differ immensely, from basic classification tasks to intricate tasks requiring state-of-the-art models. For example, differentiating between two categories (binary classification) is typically simpler than detecting hundreds of object types in a crowded scene.\nReal-time Requirements: Some applications demand immediate or real-time responses, while others may allow for some delay. In autonomous driving, real-time object detection and decision-making are paramount, whereas a recommendation engine for a shopping website might tolerate slight delays.\nScalability Concerns: Given the varied scale of applications, from edge devices to cloud-based servers, tasks must represent the diverse computational environments where inference occurs. For example, an inference task running on a smartphone’s limited resources differs from a powerful cloud server.\nEvaluation Metrics Diversity: The metrics used to evaluate performance can differ significantly depending on the task. Finding a common ground or universally accepted metric for diverse tasks is challenging. For example, precision and recall might be vital for a medical diagnosis task, whereas throughput (inferences per second) might be more crucial for video processing tasks.\nEthical and Privacy Concerns: Concerns related to ethics and privacy exist, especially in sensitive areas like facial recognition or personal data processing. These concerns can impact the selection and nature of tasks used for benchmarking. For example, using real-world facial data for benchmarking can raise privacy issues, whereas synthetic data might not replicate real-world challenges.\nHardware Diversity: With a wide range of devices from GPUs, CPUs, and TPUs to custom ASICs used for inference, ensuring that tasks are representative across varied hardware is challenging. For example, a task optimized for inference on a GPU might perform sub-optimally on an edge device.\n\n\n\nBenchmarks\nHere are some original works that laid the fundamental groundwork for developing systematic benchmarks for inference machine learning systems.\nMLPerf Inference Benchmark\nMLPerf Inference is a comprehensive benchmark suite that assesses machine learning models’ performance during the inference phase. It encompasses a variety of workloads, including image classification, object detection, and natural language processing, aiming to provide standardized and insightful metrics for evaluating different inference systems.\nMetrics:\n\nInference time\nLatency\nThroughput\nAccuracy\nEnergy consumption\n\nAI Benchmark\nAI Benchmark is a benchmarking tool that evaluates the performance of AI and machine learning models on mobile devices and edge computing platforms. It includes tests for image classification, object detection, and natural language processing tasks, providing a detailed analysis of the inference performance on different hardware platforms.\nMetrics:\n\nInference time\nLatency\nEnergy consumption\nMemory usage\nThroughput\n\nOpenVINO toolkit\nOpenVINO toolkit provides a benchmark tool to measure the performance of deep learning models for various tasks, such as image classification, object detection, and facial recognition, on Intel hardware. It offers detailed insights into the models’ inference performance on different hardware configurations.\nMetrics:\n\nInference time\nThroughput\nLatency\nCPU and GPU utilization\n\nExample Use Case\nConsider a scenario where we want to evaluate the inference performance of an object detection model on a specific edge device.\nTask: The task is to perform real-time object detection on video streams, detecting and identifying objects such as vehicles, pedestrians, and traffic signs.\nBenchmark: We can use the AI Benchmark for this task as it evaluates inference performance on edge devices, which suits our scenario.\nMetrics: We will measure the following metrics:\n\nInference time to process each video frame\nLatency to generate the bounding boxes for detected objects\nEnergy consumption during the inference process\nThroughput in terms of video frames processed per second\n\nBy measuring these metrics, we can assess the performance of the object detection model on the edge device and identify any potential bottlenecks or areas for optimization to enhance real-time processing capabilities.\n\n\n\n\n\n\nExercise 11.2: Inference Benchmarks - MLPerf\n\n\n\n\n\nGet ready to put your AI models to the ultimate test! MLPerf is like the Olympics for machine learning performance. In this Colab, we’ll use a toolkit called CK to run official MLPerf benchmarks, measure how fast and accurate your model is, and even use TVM to give it a super speed boost. Are you ready to see your model earn its medal?\n\n\n\n\n\n\n\n11.4.6 Benchmark Example\nTo properly illustrate the components of a systems benchmark, we can look at the keyword spotting benchmark in MLPerf Tiny and explain the motivation behind each decision.\n\nTask\nKeyword spotting was selected as a task because it is a common use case in TinyML that has been well-established for years. Additionally, the typical hardware used for keyword spotting differs substantially from the offerings of other benchmarks, such as MLPerf Inference’s speech recognition task.\n\n\nDataset\nGoogle Speech Commands(Warden (2018)) was selected as the best dataset to represent the task. The dataset is well-established in the research community and has permissive licensing, allowing it to be easily used in a benchmark.\n\nWarden, Pete. 2018. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” ArXiv Preprint abs/1804.03209. https://arxiv.org/abs/1804.03209.\n\n\nModel\nThe next core component is the model, which will act as the primary workload for the benchmark. The model should be well established as a solution to the selected task rather than a state-of-the-art solution. The model selected is a simple depthwise separable convolution model. This architecture is not the state-of-the-art solution to the task, but it is well-established and not designed for a specific hardware platform like many state-of-the-art solutions. Despite being an inference benchmark, the benchmark also establishes a reference training recipe to be fully reproducible and transparent.\n\n\nMetrics\nLatency was selected as the primary metric for the benchmark, as keyword spotting systems need to react quickly to maintain user satisfaction. Additionally, given that TinyML systems are often battery-powered, energy consumption is measured to ensure the hardware platform is efficient. The accuracy of the model is also measured to ensure that the optimizations applied by a submitter, such as quantization, don’t degrade the accuracy beyond a threshold.\n\n\nBenchmark Harness\nMLPerf Tiny uses EEMBCs EnergyRunner benchmark harness to load the inputs to the model and isolate and measure the device’s energy consumption. When measuring energy consumption, it’s critical to select a harness that is accurate at the expected power levels of the devices under test and simple enough not to become a burden for the benchmark participants.\n\n\nBaseline Submission\nBaseline submissions are critical for contextualizing results and as a reference point to help participants get started. The baseline submission should prioritize simplicity and readability over state-of-the-art performance. The keyword spotting baseline uses a standard STM microcontroller as its hardware and TensorFlow Lite for Microcontrollers(David et al. (2021)) as its inference framework.\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite Micro: Embedded Machine Learning for Tinyml Systems.” Proceedings of Machine Learning and Systems 3: 800–811.\n\n\n\n11.4.7 Challenges and Limitations\nWhile benchmarking provides a structured methodology for performance evaluation in complex domains like artificial intelligence and computing, the process also poses several challenges. If not properly addressed, these challenges can undermine the credibility and accuracy of benchmarking results. Some of the predominant difficulties faced in benchmarking include the following:\n\nIncomplete problem coverage—Benchmark tasks may not fully represent the problem space. For instance, common image classification datasets like CIFAR-10 have limited diversity in image types. Algorithms tuned for such benchmarks may fail to generalize well to real-world datasets.\nStatistical insignificance - Benchmarks must have enough trials and data samples to produce statistically significant results. For example, benchmarking an OCR model on only a few text scans may not adequately capture its true error rates.\nLimited reproducibility—Varying hardware, software versions, codebases, and other factors can reduce the reproducibility of benchmark results. MLPerf addresses this by providing reference implementations and environment specifications.\nMisalignment with end goals - Benchmarks focusing only on speed or accuracy metrics may misalign real-world objectives like cost and power efficiency. Benchmarks must reflect all critical performance axes.\nRapid staleness—Due to the rapid pace of advancements in AI and computing, benchmarks and their datasets can quickly become outdated. Maintaining up-to-date benchmarks is thus a persistent challenge.\n\nBut of all these, the most important challenge is benchmark engineering.\n\nHardware Lottery\nThe “hardware lottery” in benchmarking machine learning systems refers to the situation where the success or efficiency of a machine learning model is significantly influenced by the compatibility of the model with the underlying hardware (Chu et al. 2021). In other words, some models perform exceptionally well because they are a good fit for the particular characteristics or capabilities of the hardware they are run on rather than because they are intrinsically superior models. Figure 11.2 demonstrates the performance of different models on different hardware: notice how (follow the big yellow arrow) the Mobilenet V3 Large model (in green) has the lowest latency among all models when run unquantized on the Pixel4 CPU. At the same time, it performs the worst on Pixel4 DSP Qualcomm Snapdragon 855. Unfortunately, the hardware used is often omitted from papers or only briefly mentioned, making reproducing results difficult, if possible.\n\nChu, Grace, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton, Pieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, and Andrew Howard. 2021. “Discovering Multi-Hardware Mobile Models via Architecture Search.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3022–31. IEEE. https://doi.org/10.1109/cvprw53098.2021.00337.\n\n\n\n\n\n\nFigure 11.2: Hardware Lottery.\n\n\n\nFor instance, certain machine learning models may be designed and optimized to take advantage of the parallel processing capabilities of specific hardware accelerators, such as Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs). As a result, these models might show superior performance when benchmarked on such hardware compared to other models that are not optimized for the hardware.\nFor example, a 2018 paper introduced a new convolutional neural network architecture for image classification that achieved state-of-the-art accuracy on ImageNet. However, the paper only mentioned that the model was trained on 8 GPUs without specifying the model, memory size, or other relevant details. A follow-up study tried to reproduce the results but found that training the same model on commonly available GPUs achieved 10% lower accuracy, even after hyperparameter tuning. The original hardware likely had far higher memory bandwidth and compute power. As another example, training times for large language models can vary drastically based on the GPUs used.\nThe “hardware lottery” can introduce challenges and biases in benchmarking machine learning systems, as the model’s performance is not solely dependent on the model’s architecture or algorithm but also on the compatibility and synergies with the underlying hardware. This can make it difficult to compare different models fairly and to identify the best model based on its intrinsic merits. It can also lead to a situation where the community converges on models that are a good fit for the popular hardware of the day, potentially overlooking other models that might be superior but incompatible with the current hardware trends.\n\n\nBenchmark Engineering\nHardware lottery occurs when a machine learning model unintentionally performs exceptionally well or poorly on a specific hardware setup due to unforeseen compatibility or incompatibility. The model is not explicitly designed or optimized for that particular hardware by the developers or engineers; rather, it happens to align or (mis)align with the hardware’s capabilities or limitations. In this case, the model’s performance on the hardware is a byproduct of coincidence rather than design.\nIn contrast to the accidental hardware lottery, benchmark engineering involves deliberately optimizing or designing a machine learning model to perform exceptionally well on specific hardware, often to win benchmarks or competitions. This intentional optimization might include tweaking the model’s architecture, algorithms, or parameters to exploit the hardware’s features and capabilities fully.\n\n\nProblem\nBenchmark engineering refers to tweaking or modifying an AI system to optimize performance on specific benchmark tests, often at the expense of generalizability or real-world performance. This can include adjusting hyperparameters, training data, or other aspects of the system specifically to achieve high scores on benchmark metrics without necessarily improving the overall functionality or utility of the system.\nThe motivation behind benchmark engineering often stems from the desire to achieve high-performance scores for marketing or competitive purposes. High benchmark scores can demonstrate the superiority of an AI system compared to competitors and can be a key selling point for potential users or investors. This pressure to perform well on benchmarks sometimes leads to prioritizing benchmark-specific optimizations over more holistic improvements to the system.\nIt can lead to several risks and challenges. One of the primary risks is that the AI system may perform better in real-world applications than the benchmark scores suggest. This can lead to user dissatisfaction, reputational damage, and potential safety or ethical concerns. Furthermore, benchmark engineering can contribute to a lack of transparency and accountability in the AI community, as it can be difficult to discern how much of an AI system’s performance is due to genuine improvements versus benchmark-specific optimizations.\nThe AI community must prioritize transparency and accountability to mitigate the risks associated with benchmark engineering. This can include disclosing any optimizations or adjustments made specifically for benchmark tests and providing more comprehensive evaluations of AI systems that include real-world performance metrics and benchmark scores. Researchers and developers must prioritize holistic improvements to AI systems that improve their generalizability and functionality across various applications rather than focusing solely on benchmark-specific optimizations.\n\n\nIssues\nOne of the primary problems with benchmark engineering is that it can compromise the real-world performance of AI systems. When developers focus on optimizing their systems to achieve high scores on specific benchmark tests, they may neglect other important system performance aspects crucial in real-world applications. For example, an AI system designed for image recognition might be engineered to perform exceptionally well on a benchmark test that includes a specific set of images but needs help to recognize images slightly different from those in the test set accurately.\nAnother area for improvement with benchmark engineering is that it can result in AI systems that lack generalizability. In other words, while the system may perform well on the benchmark test, it may need help handling a diverse range of inputs or scenarios. For instance, an AI model developed for natural language processing might be engineered to achieve high scores on a benchmark test that includes a specific type of text but fails to process text that falls outside of that specific type accurately.\nIt can also lead to misleading results. When AI systems are engineered to perform well on benchmark tests, the results may not accurately reflect the system’s true capabilities. This can be problematic for users or investors who rely on benchmark scores to make informed decisions about which AI systems to use or invest in. For example, an AI system engineered to achieve high scores on a benchmark test for speech recognition might need to be more capable of accurately recognizing speech in real-world situations, leading users or investors to make decisions based on inaccurate information.\n\n\nMitigation\nThere are several ways to mitigate benchmark engineering. Transparency in the benchmarking process is crucial to maintaining benchmark accuracy and reliability. This involves clearly disclosing the methodologies, data sets, and evaluation criteria used in benchmark tests, as well as any optimizations or adjustments made to the AI system for the purpose of the benchmark.\nOne way to achieve transparency is through the use of open-source benchmarks. Open-source benchmarks are made publicly available, allowing researchers, developers, and other stakeholders to review, critique, and contribute to them, thereby ensuring their accuracy and reliability. This collaborative approach also facilitates sharing best practices and developing more robust and comprehensive benchmarks.\nOne example is the MLPerf Tiny. It’s an open-source framework designed to make it easy to compare different solutions in the world of TinyML. Its modular design allows components to be swapped out for comparison or improvement. The reference implementations, shown in green and orange in Figure 11.3, act as the baseline for results. TinyML often needs optimization across the entire system, and users can contribute by focusing on specific parts, like quantization. The modular benchmark design allows users to showcase their contributions and competitive advantage by modifying a reference implementation. In short, MLPerf Tiny offers a flexible and modular way to assess and enhance TinyML applications, making it easier to compare and improve different aspects of the technology.\n\n\n\n\n\n\nFigure 11.3: MLPerf Tiny modular design. Credit: Mattson et al. (2020a).\n\n\n———, et al. 2020a. “MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.” IEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\nAnother method for achieving transparency is through peer review of benchmarks. This involves having independent experts review and validate the benchmark’s methodology, data sets, and results to ensure their credibility and reliability. Peer review can provide a valuable means of verifying the accuracy of benchmark tests and help build confidence in the results.\nStandardization of benchmarks is another important solution to mitigate benchmark engineering. Standardized benchmarks provide a common framework for evaluating AI systems, ensuring consistency and comparability across different systems and applications. This can be achieved by developing industry-wide standards and best practices for benchmarking and through common metrics and evaluation criteria.\nThird-party verification of results can also be valuable in mitigating benchmark engineering. This involves having an independent third party verify the results of a benchmark test to ensure their credibility and reliability. Third-party verification can build confidence in the results and provide a valuable means of validating the performance and capabilities of AI systems.", + "text": "11.4 System Benchmarking\n\n11.4.1 Granularity\nMachine learning system benchmarking provides a structured and systematic approach to assessing a system’s performance across various dimensions. Given the complexity of ML systems, we can dissect their performance through different levels of granularity and obtain a comprehensive view of the system’s efficiency, identify potential bottlenecks, and pinpoint areas for improvement. To this end, various types of benchmarks have evolved over the years and continue to persist.\nFigure 11.1 illustrates the different layers of granularity of an ML system. At the application level, end-to-end benchmarks assess the overall system performance, considering factors like data preprocessing, model training, and inference. While at the model layer, benchmarks focus on assessing the efficiency and accuracy of specific models. This includes evaluating how well models generalize to new data and their computational efficiency during training and inference. Furthermore, benchmarking can extend to hardware and software infrastructure, examining the performance of individual components like GPUs or TPUs.\n\n\n\n\n\n\nFigure 11.1: ML system granularity.\n\n\n\n\nMicro Benchmarks\nMicro-benchmarks in AI are specialized, evaluating distinct components or specific operations within a broader machine learning process. These benchmarks zero in on individual tasks, offering insights into the computational demands of a particular neural network layer, the efficiency of a unique optimization technique, or the throughput of a specific activation function. For instance, practitioners might use micro-benchmarks to measure the computational time required by a convolutional layer in a deep learning model or to evaluate the speed of data preprocessing that feeds data into the model. Such granular assessments are instrumental in fine-tuning and optimizing discrete aspects of AI models, ensuring that each component operates at its peak potential.\nThese types of microbenchmarks include zooming into very specific operations or components of the AI pipeline, such as the following:\n\nTensor Operations: Libraries like cuDNN (by NVIDIA) often have benchmarks to measure the performance of individual tensor operations, such as convolutions or matrix multiplications, which are foundational to deep learning computations.\nActivation Functions: Benchmarks that measure the speed and efficiency of various activation functions like ReLU, Sigmoid, or Tanh in isolation.\nLayer Benchmarks: Evaluations of the computational efficiency of distinct neural network layers, such as LSTM or Transformer blocks, when operating on standardized input sizes.\n\nExample: DeepBench, introduced by Baidu, is a good example of something that assesses the above. DeepBench assesses the performance of basic operations in deep learning models, providing insights into how different hardware platforms handle neural network training and inference.\n\n\n\n\n\n\nExercise 11.1: System Benchmarking - Tensor Operations\n\n\n\n\n\nEver wonder how your image filters get so fast? Special libraries like cuDNN supercharge those calculations on certain hardware. In this Colab, we’ll use cuDNN with PyTorch to speed up image filtering. Think of it as a tiny benchmark, showing how the right software can unlock your GPU’s power!\n\n\n\n\n\n\nMacro Benchmarks\nMacro benchmarks provide a holistic view, assessing the end-to-end performance of entire machine learning models or comprehensive AI systems. Rather than focusing on individual operations, macro-benchmarks evaluate the collective efficacy of models under real-world scenarios or tasks. For example, a macro-benchmark might assess the complete performance of a deep learning model undertaking image classification on a dataset like ImageNet. This includes gauging accuracy, computational speed, and resource consumption. Similarly, one might measure the cumulative time and resources needed to train a natural language processing model on extensive text corpora or evaluate the performance of an entire recommendation system, from data ingestion to final user-specific outputs.\nExamples: These benchmarks evaluate the AI model:\n\nMLPerf Inference(Reddi et al. (2020)): An industry-standard set of benchmarks for measuring the performance of machine learning software and hardware. MLPerf has a suite of dedicated benchmarks for specific scales, such as MLPerf Mobile for mobile class devices and MLPerf Tiny, which focuses on microcontrollers and other resource-constrained devices.\nEEMBC’s MLMark: A benchmarking suite for evaluating the performance and power efficiency of embedded devices running machine learning workloads. This benchmark provides insights into how different hardware platforms handle tasks like image recognition or audio processing.\nAI-Benchmark(Ignatov et al. (2019)): A benchmarking tool designed for Android devices, it evaluates the performance of AI tasks on mobile devices, encompassing various real-world scenarios like image recognition, face parsing, and optical character recognition.\n\n\nReddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, et al. 2020. “MLPerf Inference Benchmark.” In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 446–59. IEEE; IEEE. https://doi.org/10.1109/isca45697.2020.00045.\n\nIgnatov, Andrey, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. “AI Benchmark: All about Deep Learning on Smartphones in 2019.” In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 0–0. IEEE. https://doi.org/10.1109/iccvw.2019.00447.\n\n\nEnd-to-end Benchmarks\nEnd-to-end benchmarks provide an all-inclusive evaluation that extends beyond the boundaries of the AI model itself. Instead of focusing solely on a machine learning model’s computational efficiency or accuracy, these benchmarks encompass the entire pipeline of an AI system. This includes initial data preprocessing, the core model’s performance, post-processing of the model’s outputs, and other integral components like storage and network interactions.\nData preprocessing is the first stage in many AI systems, transforming raw data into a format suitable for model training or inference. These preprocessing steps’ efficiency, scalability, and accuracy are vital for the overall system’s performance. End-to-end benchmarks assess this phase, ensuring that data cleaning, normalization, augmentation, or any other transformation process doesn’t become a bottleneck.\nThe post-processing phase also takes center stage. This involves interpreting the model’s raw outputs, possibly converting scores into meaningful categories, filtering results, or even integrating with other systems. In real-world applications, this phase is crucial for delivering actionable insights, and end-to-end benchmarks ensure it’s both efficient and effective.\nBeyond the core AI operations, other system components are important in the overall performance and user experience. Storage solutions, whether cloud-based, on-premises, or hybrid, can significantly impact data retrieval and storage times, especially with vast AI datasets. Similarly, network interactions, vital for cloud-based AI solutions or distributed systems, can become performance bottlenecks if not optimized. End-to-end benchmarks holistically evaluate these components, ensuring that the entire system operates seamlessly, from data retrieval to final output delivery.\nTo date, there are no public, end-to-end benchmarks that take into account the role of data storage, network, and compute performance. Arguably, MLPerf Training and Inference come close to the idea of an end-to-end benchmark, but they are exclusively focused on ML model performance and do not represent real-world deployment scenarios of how models are used in the field. Nonetheless, they provide a very useful signal that helps assess AI system performance.\nGiven the inherent specificity of end-to-end benchmarking, it is typically performed internally at a company by instrumenting real production deployments of AI. This allows engineers to have a realistic understanding and breakdown of the performance, but given the sensitivity and specificity of the information, it is rarely reported outside of the company.\n\n\nUnderstanding the Trade-offs\nDifferent issues arise at different stages of an AI system. Micro-benchmarks help fine-tune individual components, macro-benchmarks aid in refining model architectures or algorithms, and end-to-end benchmarks guide the optimization of the entire workflow. By understanding where a problem lies, developers can apply targeted optimizations.\nMoreover, while individual components of an AI system might perform optimally in isolation, bottlenecks can emerge when they interact. End-to-end benchmarks, in particular, are crucial to ensure that the entire system, when operating collectively, meets desired performance and efficiency standards.\nFinally, organizations can make informed decisions on where to allocate resources by discerning performance bottlenecks or inefficiencies. For instance, if micro-benchmarks reveal inefficiencies in specific tensor operations, investments can be directed toward specialized hardware accelerators. Conversely, if end-to-end benchmarks indicate data retrieval issues, investments might be channeled toward better storage solutions.\n\n\n\n11.4.2 Benchmark Components\nAt its core, an AI benchmark is more than just a test or a score; it’s a comprehensive evaluation framework. To understand this in-depth, let’s break down the typical components that go into an AI benchmark.\n\nStandardized Datasets\nDatasets serve as the foundation for most AI benchmarks. They provide a consistent data set on which models are trained and evaluated, ensuring a level playing field for comparisons.\nExample: ImageNet, a large-scale dataset containing millions of labeled images spanning thousands of categories, is a popular benchmarking standard for image classification tasks.\n\n\nPre-defined Tasks\nA benchmark should have a clear objective or task that models aim to achieve. This task defines the problem the AI system is trying to solve.\nExample: Tasks for natural language processing benchmarks might include sentiment analysis, named entity recognition, or machine translation.\n\n\nEvaluation Metrics\nOnce a task is defined, benchmarks require metrics to quantify performance. These metrics offer objective measures to compare different models or systems.\nIn classification tasks, metrics like accuracy, precision, recall, and F1 score are commonly used. Mean squared or absolute errors might be employed for regression tasks.\n\n\nBaseline Models\nBenchmarks often include baseline models or reference implementations. These serve as starting points or minimum performance standards against which new models or techniques can be compared.\nExample: In many benchmark suites, simple models like linear regression or basic neural networks serve as baselines to provide context for more complex model evaluations.\n\n\nHardware and Software Specifications\nGiven the variability introduced by different hardware and software configurations, benchmarks often specify or document the hardware and software environments in which tests are conducted.\nExample: An AI benchmark might note that evaluations were conducted on an NVIDIA Tesla V100 GPU using TensorFlow v2.4.\n\n\nEnvironmental Conditions\nAs external factors can influence benchmark results, it’s essential to either control or document conditions like temperature, power source, or system background processes.\nExample: Mobile AI benchmarks might specify that tests were conducted at room temperature with devices plugged into a power source to eliminate battery-level variances.\n\n\nReproducibility Rules\nTo ensure benchmarks are credible and can be replicated by others in the community, they often include detailed protocols covering everything from random seeds used to exact hyperparameters.\nExample: A benchmark for a reinforcement learning task might detail the exact training episodes, exploration-exploitation ratios, and reward structures used.\n\n\nResult Interpretation Guidelines\nBeyond raw scores or metrics, benchmarks often provide guidelines or context to interpret results, helping practitioners understand the broader implications.\nExample: A benchmark might highlight that while Model A scored higher than Model B in accuracy, it offers better real-time performance, making it more suitable for time-sensitive applications.\n\n\n\n11.4.3 Training vs. Inference\nThe development life cycle of a machine learning model involves two critical phases - training and inference. Training is the process of learning patterns from data to create the model. Inference refers to the model making predictions on new unlabeled data. Both phases play indispensable yet distinct roles. Consequently, each phase warrants rigorous benchmarking to evaluate performance metrics like speed, accuracy, and computational efficiency.\nBenchmarking the training phase provides insights into how different model architectures, hyperparameter values, and optimization algorithms impact the time and resources needed to train the model. For instance, benchmarking shows how neural network depth affects training time on a given dataset. Benchmarking also reveals how hardware accelerators like GPUs and TPUs can speed up training.\nOn the other hand, benchmarking inference evaluates model performance in real-world conditions after deployment. Key metrics include latency, throughput, memory footprint, and power consumption. Inference benchmarking determines if a model meets the requirements of its target application regarding response time and device constraints, which is typically the focus of TinyML. However, we will discuss these broadly to ensure a general understanding.\n\n\n11.4.4 Training Benchmarks\nTraining represents the phase where the system processes and ingests raw data to adjust and refine its parameters. Therefore, it is an algorithmic activity and involves system-level considerations, including data pipelines, storage, computing resources, and orchestration mechanisms. The goal is to ensure that the ML system can efficiently learn from data, optimizing both the model’s performance and the system’s resource utilization.\n\nPurpose\nFrom an ML systems perspective, training benchmarks evaluate how well the system scales with increasing data volumes and computational demands. It’s about understanding the interplay between hardware, software, and the data pipeline in the training process.\nConsider a distributed ML system designed to train on vast datasets, like those used in large-scale e-commerce product recommendations. A training benchmark would assess how efficiently the system scales across multiple nodes, manage data sharding and handle failures or node drop-offs during training.\nTraining benchmarks evaluate CPU, GPU, memory, and network utilization during the training phase, guiding system optimizations. When training a model in a cloud-based ML system, it’s crucial to understand how resources are being utilized. Are GPUs being fully leveraged? Is there unnecessary memory overhead? Benchmarks can highlight bottlenecks or inefficiencies in resource utilization, leading to cost savings and performance improvements.\nTraining an ML model is contingent on timely and efficient data delivery. Benchmarks in this context would also assess the efficiency of data pipelines, data preprocessing speed, and storage retrieval times. For real-time analytics systems, like those used in fraud detection, the speed at which training data is ingested, preprocessed, and fed into the model can be critical. Benchmarks would evaluate the latency of data pipelines, the efficiency of storage systems (like SSDs vs. HDDs), and the speed of data augmentation or transformation tasks.\n\n\nMetrics\nWhen viewed from a systems perspective, training metrics offer insights that transcend conventional algorithmic performance indicators. These metrics measure the model’s learning efficacy and gauge the efficiency, scalability, and robustness of the entire ML system during the training phase. Let’s delve deeper into these metrics and their significance.\nThe following metrics are often considered important:\n\nTraining Time: The time it takes to train a model from scratch until it reaches a satisfactory performance level. It directly measures the computational resources required to train a model. For example, Google’s BERT(Devlin et al. (2019)) is a natural language processing model that requires several days to train on a massive corpus of text data using multiple GPUs. The long training time is a significant resource consumption and cost challenge.\nScalability: How well the training process can handle increases in data size or model complexity. Scalability can be assessed by measuring training time, memory usage, and other resource consumption as data size or model complexity increases. OpenAI’s GPT-3(Brown et al. (2020)) model has 175 billion parameters, making it one of the largest language models in existence. Training GPT-3 required extensive engineering efforts to scale the training process to handle the massive model size. This involved using specialized hardware, distributed training, and other techniques to ensure the model could be trained efficiently.\nResource Utilization: The extent to which the training process utilizes available computational resources such as CPU, GPU, memory, and disk I/O. High resource utilization can indicate an efficient training process, while low utilization can suggest bottlenecks or inefficiencies. For instance, training a convolutional neural network (CNN) for image classification requires significant GPU resources. Utilizing multi-GPU setups and optimizing the training code for GPU acceleration can greatly improve resource utilization and training efficiency.\nMemory Consumption: The amount of memory the training process uses. Memory consumption can be a limiting factor for training large models or datasets. For example, Google researchers faced significant memory consumption challenges when training BERT. The model has hundreds of millions of parameters, requiring large amounts of memory. The researchers had to develop techniques to reduce memory consumption, such as gradient checkpointing and model parallelism.\n** Energy Consumption: ** The energy consumed during training. As machine learning models become more complex, energy consumption has become an important consideration. Training large machine learning models can consume significant energy, leading to a large carbon footprint. For instance, the training of OpenAI’s GPT-3 was estimated to have a carbon footprint equivalent to traveling by car for 700,000 kilometers.\nThroughput: The number of training samples processed per unit time. Higher throughput generally indicates a more efficient training process. The throughput is an important metric to consider when training a recommendation system for an e-commerce platform. A high throughput ensures that the model can process large volumes of user interaction data promptly, which is crucial for maintaining the relevance and accuracy of the recommendations. But it’s also important to understand how to balance throughput with latency bounds. Therefore, a latency-bounded throughput constraint is often imposed on service-level agreements for data center application deployments.\nCost: The cost of training a model can include both computational and human resources. Cost is important when considering the practicality and feasibility of training large or complex models. Training large language models like GPT-3 is estimated to cost millions of dollars. This cost includes computational, electricity and human resources required for model development and training.\nFault Tolerance and Robustness: The ability of the training process to handle failures or errors without crashing or producing incorrect results. This is important for ensuring the reliability of the training process. Network failures or hardware malfunctions can occur in a real-world scenario where a machine-learning model is being trained on a distributed system. In recent years, it has become abundantly clear that faults arising from silent data corruption have emerged as a major issue. A fault-tolerant and robust training process can recover from such failures without compromising the model’s integrity.\nEase of Use and Flexibility: The ease with which the training process can be set up and used and its flexibility in handling different types of data and models. In companies like Google, efficiency can sometimes be measured by the number of Software Engineer (SWE) years saved since that translates directly to impact. Ease of use and flexibility can reduce the time and effort required to train a model. TensorFlow and PyTorch are popular machine-learning frameworks that provide user-friendly interfaces and flexible APIs for building and training machine-learning models. These frameworks support many model architectures and are equipped with tools that simplify the training process.\nReproducibility: The ability to reproduce the training process results. Reproducibility is important for verifying a model’s correctness and validity. However, variations due to stochastic network characteristics often make it hard to reproduce the precise behavior of applications being trained, which can present a challenge for benchmarking.\n\n\nDevlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North, 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423.\nBy benchmarking for these types of metrics, we can obtain a comprehensive view of the training process’s performance and efficiency from a systems perspective. This can help identify areas for improvement and ensure that resources are used effectively.\n\n\nTasks\nSelecting a handful of representative tasks for benchmarking machine learning systems is challenging because machine learning is applied to various domains with unique characteristics and requirements. Here are some of the challenges faced in selecting representative tasks:\n\nDiversity of Applications: Machine learning is used in numerous fields such as healthcare, finance, natural language processing, computer vision, and many more. Each field has specific tasks that may not be representative of other fields. For example, image classification tasks in computer vision may not be relevant to financial fraud detection.\nVariability in Data Types and Quality: Different tasks require different data types, such as text, images, videos, or numerical data. Data quality and availability can vary greatly between tasks, making it difficult to select tasks that are representative of the general challenges faced in machine learning.\nTask Complexity and Difficulty: The complexity of tasks varies greatly. Some are relatively straightforward, while others are highly complex and require sophisticated models and techniques. Selecting representative tasks that cover the complexities encountered in machine learning is challenging.\nEthical and Privacy Concerns: Some tasks may involve sensitive or private data, such as medical records or personal information. These tasks may have ethical and privacy concerns that need to be addressed, making them less suitable as representative tasks for benchmarking.\nScalability and Resource Requirements: Different tasks may have different scalability and resource requirements. Some tasks may require extensive computational resources, while others can be performed with minimal resources. Selecting tasks that represent the general resource requirements in machine learning is difficult.\nEvaluation Metrics: The metrics used to evaluate the performance of machine learning models vary between tasks. Some tasks may have well-established evaluation metrics, while others lack clear or standardized metrics. This can make it challenging to compare performance across different tasks.\nGeneralizability of Results: The results obtained from benchmarking on a specific task may not be generalizable to other tasks. This means that a machine learning system’s performance on a selected task may not be indicative of its performance on other tasks.\n\nIt is important to carefully consider these factors when designing benchmarks to ensure they are meaningful and relevant to the diverse range of tasks encountered in machine learning.\n\n\nBenchmarks\nHere are some original works that laid the fundamental groundwork for developing systematic benchmarks for training machine learning systems.\nMLPerf Training Benchmark\nMLPerf is a suite of benchmarks designed to measure the performance of machine learning hardware, software, and services. The MLPerf Training benchmark (Mattson et al. 2020a) focuses on the time it takes to train models to a target quality metric. It includes diverse workloads, such as image classification, object detection, translation, and reinforcement learning.\nMetrics:\n\nTraining time to target quality\nThroughput (examples per second)\nResource utilization (CPU, GPU, memory, disk I/O)\n\nDAWNBench\nDAWNBench (Coleman et al. 2019) is a benchmark suite focusing on end-to-end deep learning training time and inference performance. It includes common tasks such as image classification and question answering.\n\nColeman, Cody, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2019. “Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark.” ACM SIGOPS Operating Systems Review 53 (1): 14–25. https://doi.org/10.1145/3352020.3352024.\nMetrics:\n\nTime to train to target accuracy\nInference latency\nCost (in terms of cloud computing and storage resources)\n\nFathom\nFathom (Adolf et al. 2016) is a benchmark from Harvard University that evaluates the performance of deep learning models using a diverse set of workloads. These include common tasks such as image classification, speech recognition, and language modeling.\n\nAdolf, Robert, Saketh Rama, Brandon Reagen, Gu-yeon Wei, and David Brooks. 2016. “Fathom: Reference Workloads for Modern Deep Learning Methods.” In 2016 IEEE International Symposium on Workload Characterization (IISWC), 1–10. IEEE; IEEE. https://doi.org/10.1109/iiswc.2016.7581275.\nMetrics:\n\nOperations per second (to measure computational efficiency)\nTime to completion for each workload\nMemory bandwidth\n\nExample Use Case\nConsider a scenario where we want to benchmark the training of an image classification model on a specific hardware platform.\n\nTask: The task is to train a convolutional neural network (CNN) for image classification on the CIFAR-10 dataset.\nBenchmark: We can use the MLPerf Training benchmark for this task. It includes an image classification workload that is relevant to our task.\nMetrics: We will measure the following metrics:\n\n\nTraining time to reach a target accuracy of 90%.\nThroughput in terms of images processed per second.\nGPU and CPU utilization during training.\n\nBy measuring these metrics, we can assess the performance and efficiency of the training process on the selected hardware platform. This information can then be used to identify potential bottlenecks or areas for improvement.\n\n\n\n11.4.5 Inference Benchmarks\nInference in machine learning refers to using a trained model to make predictions on new, unseen data. It is the phase where the model applies its learned knowledge to solve the problem it was designed for, such as classifying images, recognizing speech, or translating text.\n\nPurpose\nWhen we build machine learning models, our ultimate goal is to deploy them in real-world applications where they can provide accurate and reliable predictions on new, unseen data. This process of using a trained model to make predictions is known as inference. A machine learning model’s real-world performance can differ significantly from its performance on training or validation datasets, which makes benchmarking inference a crucial step in the development and deployment of machine learning models.\nBenchmarking inference allows us to evaluate how well a machine-learning model performs in real-world scenarios. This evaluation ensures that the model is practical and reliable when deployed in applications, providing a more comprehensive understanding of the model’s behavior with real data. Additionally, benchmarking can help identify potential bottlenecks or limitations in the model’s performance. For example, if a model takes less time to predict, it may be impractical for real-time applications such as autonomous driving or voice assistants.\nResource efficiency is another critical aspect of inference, as it can be computationally intensive and require significant memory and processing power. Benchmarking helps ensure that the model is efficient regarding resource usage, which is particularly important for edge devices with limited computational capabilities, such as smartphones or IoT devices. Moreover, benchmarking allows us to compare the performance of our model with competing models or previous versions of the same model. This comparison is essential for making informed decisions about which model to deploy in a specific application.\nFinally, it is vital to ensure that the model’s predictions are not only accurate but also consistent across different data points. Benchmarking helps verify the model’s accuracy and consistency, ensuring that it meets the application’s requirements. It also assesses the model’s robustness, ensuring that it can handle real-world data variability and still make accurate predictions.\n\n\nMetrics\n\nAccuracy: Accuracy is one of the most vital metrics when benchmarking machine learning models. It quantifies the proportion of correct predictions made by the model compared to the true values or labels. For example, if a spam detection model can correctly classify 95 out of 100 email messages as spam or not, its accuracy would be calculated as 95%.\nLatency: Latency is a performance metric that calculates the time lag or delay between the input receipt and the production of the corresponding output by the machine learning system. An example that clearly depicts latency is a real-time translation application; if a half-second delay exists from the moment a user inputs a sentence to the time the app displays the translated text, then the system’s latency is 0.5 seconds.\nLatency-Bounded Throughput: Latency-bounded throughput is a valuable metric that combines the aspects of latency and throughput, measuring the maximum throughput of a system while still meeting a specified latency constraint. For example, in a video streaming application that utilizes a machine learning model to generate and display subtitles automatically, latency-bounded throughput would measure how many video frames the system can process per second (throughput) while ensuring that the subtitles are displayed with no more than a 1-second delay (latency). This metric is particularly important in real-time applications where meeting latency requirements is crucial to the user experience.\nThroughput: Throughput assesses the system’s capacity by measuring the number of inferences or predictions a machine learning model can handle within a specific unit of time. Consider a speech recognition system that employs a Recurrent Neural Network (RNN) as its underlying model; if this system can process and understand 50 different audio clips in a minute, then its throughput rate stands at 50 clips per minute.\nInference Time: Inference time is a crucial metric that measures the duration a machine learning system, such as a Convolutional Neural Network (CNN) used in image recognition tasks, takes to process an input and generate a prediction or output. For instance, if a CNN takes approximately 2 milliseconds to identify and label a cat within a given photo accurately, then its inference time is said to be 2 milliseconds.\nEnergy Efficiency: Energy efficiency is a metric that determines the amount of energy consumed by the machine learning model to perform a single inference. A prime example of this would be a natural language processing model built on a Transformer network architecture; if it utilizes 0.1 Joules of energy to translate a sentence from English to French, its energy efficiency is measured at 0.1 Joules per inference.\nMemory Usage: Memory usage quantifies the volume of RAM needed by a machine learning model to carry out inference tasks. A relevant example to illustrate this would be a face recognition system based on a CNN; if such a system requires 150 MB of RAM to process and recognize faces within an image, its memory usage is 150 MB.\n\n\n\nTasks\nThe challenges in picking representative tasks for benchmarking inference machine learning systems are, by and large, somewhat similar to the taxonomy we have provided for training. Nevertheless, to be pedantic, let’s discuss those in the context of inference machine learning systems.\n\nDiversity of Applications: Inference machine learning is employed across numerous domains such as healthcare, finance, entertainment, security, and more. Each domain has unique tasks, and what’s representative in one domain might not be in another. For example, an inference task for predicting stock prices in the financial domain might differ from image recognition tasks in the medical domain.\nVariability in Data Types: Different inference tasks require different types of data—text, images, videos, numerical data, etc. Ensuring that benchmarks address the wide variety of data types used in real-world applications is challenging. For example, voice recognition systems process audio data, which is vastly different from the visual data processed by facial recognition systems.\nTask Complexity: The complexity of inference tasks can differ immensely, from basic classification tasks to intricate tasks requiring state-of-the-art models. For example, differentiating between two categories (binary classification) is typically simpler than detecting hundreds of object types in a crowded scene.\nReal-time Requirements: Some applications demand immediate or real-time responses, while others may allow for some delay. In autonomous driving, real-time object detection and decision-making are paramount, whereas a recommendation engine for a shopping website might tolerate slight delays.\nScalability Concerns: Given the varied scale of applications, from edge devices to cloud-based servers, tasks must represent the diverse computational environments where inference occurs. For example, an inference task running on a smartphone’s limited resources differs from a powerful cloud server.\nEvaluation Metrics Diversity: The metrics used to evaluate performance can differ significantly depending on the task. Finding a common ground or universally accepted metric for diverse tasks is challenging. For example, precision and recall might be vital for a medical diagnosis task, whereas throughput (inferences per second) might be more crucial for video processing tasks.\nEthical and Privacy Concerns: Concerns related to ethics and privacy exist, especially in sensitive areas like facial recognition or personal data processing. These concerns can impact the selection and nature of tasks used for benchmarking. For example, using real-world facial data for benchmarking can raise privacy issues, whereas synthetic data might not replicate real-world challenges.\nHardware Diversity: With a wide range of devices from GPUs, CPUs, and TPUs to custom ASICs used for inference, ensuring that tasks are representative across varied hardware is challenging. For example, a task optimized for inference on a GPU might perform sub-optimally on an edge device.\n\n\n\nBenchmarks\nHere are some original works that laid the fundamental groundwork for developing systematic benchmarks for inference machine learning systems.\nMLPerf Inference Benchmark\nMLPerf Inference is a comprehensive benchmark suite that assesses machine learning models’ performance during the inference phase. It encompasses a variety of workloads, including image classification, object detection, and natural language processing, aiming to provide standardized and insightful metrics for evaluating different inference systems.\nMetrics:\n\nInference time\nLatency\nThroughput\nAccuracy\nEnergy consumption\n\nAI Benchmark\nAI Benchmark is a benchmarking tool that evaluates the performance of AI and machine learning models on mobile devices and edge computing platforms. It includes tests for image classification, object detection, and natural language processing tasks, providing a detailed analysis of the inference performance on different hardware platforms.\nMetrics:\n\nInference time\nLatency\nEnergy consumption\nMemory usage\nThroughput\n\nOpenVINO toolkit\nOpenVINO toolkit provides a benchmark tool to measure the performance of deep learning models for various tasks, such as image classification, object detection, and facial recognition, on Intel hardware. It offers detailed insights into the models’ inference performance on different hardware configurations.\nMetrics:\n\nInference time\nThroughput\nLatency\nCPU and GPU utilization\n\nExample Use Case\nConsider a scenario where we want to evaluate the inference performance of an object detection model on a specific edge device.\nTask: The task is to perform real-time object detection on video streams, detecting and identifying objects such as vehicles, pedestrians, and traffic signs.\nBenchmark: We can use the AI Benchmark for this task as it evaluates inference performance on edge devices, which suits our scenario.\nMetrics: We will measure the following metrics:\n\nInference time to process each video frame\nLatency to generate the bounding boxes for detected objects\nEnergy consumption during the inference process\nThroughput in terms of video frames processed per second\n\nBy measuring these metrics, we can assess the performance of the object detection model on the edge device and identify any potential bottlenecks or areas for optimization to enhance real-time processing capabilities.\n\n\n\n\n\n\nExercise 11.2: Inference Benchmarks - MLPerf\n\n\n\n\n\nGet ready to put your AI models to the ultimate test! MLPerf is like the Olympics for machine learning performance. In this Colab, we’ll use a toolkit called CK to run official MLPerf benchmarks, measure how fast and accurate your model is, and even use TVM to give it a super speed boost. Are you ready to see your model earn its medal?\n\n\n\n\n\n\n\n11.4.6 Benchmark Example\nTo properly illustrate the components of a systems benchmark, we can look at the keyword spotting benchmark in MLPerf Tiny and explain the motivation behind each decision.\n\nTask\nKeyword spotting was selected as a task because it is a common use case in TinyML that has been well-established for years. Additionally, the typical hardware used for keyword spotting differs substantially from the offerings of other benchmarks, such as MLPerf Inference’s speech recognition task.\n\n\nDataset\nGoogle Speech Commands(Warden (2018)) was selected as the best dataset to represent the task. The dataset is well-established in the research community and has permissive licensing, allowing it to be easily used in a benchmark.\n\nWarden, Pete. 2018. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition.” ArXiv Preprint abs/1804.03209. https://arxiv.org/abs/1804.03209.\n\n\nModel\nThe next core component is the model, which will act as the primary workload for the benchmark. The model should be well established as a solution to the selected task rather than a state-of-the-art solution. The model selected is a simple depthwise separable convolution model. This architecture is not the state-of-the-art solution to the task, but it is well-established and not designed for a specific hardware platform like many state-of-the-art solutions. Despite being an inference benchmark, the benchmark also establishes a reference training recipe to be fully reproducible and transparent.\n\n\nMetrics\nLatency was selected as the primary metric for the benchmark, as keyword spotting systems need to react quickly to maintain user satisfaction. Additionally, given that TinyML systems are often battery-powered, energy consumption is measured to ensure the hardware platform is efficient. The accuracy of the model is also measured to ensure that the optimizations applied by a submitter, such as quantization, don’t degrade the accuracy beyond a threshold.\n\n\nBenchmark Harness\nMLPerf Tiny uses EEMBCs EnergyRunner benchmark harness to load the inputs to the model and isolate and measure the device’s energy consumption. When measuring energy consumption, it’s critical to select a harness that is accurate at the expected power levels of the devices under test and simple enough not to become a burden for the benchmark participants.\n\n\nBaseline Submission\nBaseline submissions are critical for contextualizing results and as a reference point to help participants get started. The baseline submission should prioritize simplicity and readability over state-of-the-art performance. The keyword spotting baseline uses a standard STM microcontroller as its hardware and TensorFlow Lite for Microcontrollers (David et al. (2021)) as its inference framework.\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite Micro: Embedded Machine Learning for Tinyml Systems.” Proceedings of Machine Learning and Systems 3: 800–811.\n\n\n\n11.4.7 Challenges and Limitations\nWhile benchmarking provides a structured methodology for performance evaluation in complex domains like artificial intelligence and computing, the process also poses several challenges. If not properly addressed, these challenges can undermine the credibility and accuracy of benchmarking results. Some of the predominant difficulties faced in benchmarking include the following:\n\nIncomplete problem coverage—Benchmark tasks may not fully represent the problem space. For instance, common image classification datasets like CIFAR-10 have limited diversity in image types. Algorithms tuned for such benchmarks may fail to generalize well to real-world datasets.\nStatistical insignificance - Benchmarks must have enough trials and data samples to produce statistically significant results. For example, benchmarking an OCR model on only a few text scans may not adequately capture its true error rates.\nLimited reproducibility—Varying hardware, software versions, codebases, and other factors can reduce the reproducibility of benchmark results. MLPerf addresses this by providing reference implementations and environment specifications.\nMisalignment with end goals - Benchmarks focusing only on speed or accuracy metrics may misalign real-world objectives like cost and power efficiency. Benchmarks must reflect all critical performance axes.\nRapid staleness—Due to the rapid pace of advancements in AI and computing, benchmarks and their datasets can quickly become outdated. Maintaining up-to-date benchmarks is thus a persistent challenge.\n\nBut of all these, the most important challenge is benchmark engineering.\n\nHardware Lottery\nThe “hardware lottery” in benchmarking machine learning systems refers to the situation where the success or efficiency of a machine learning model is significantly influenced by the compatibility of the model with the underlying hardware (Chu et al. 2021). In other words, some models perform exceptionally well because they are a good fit for the particular characteristics or capabilities of the hardware they are run on rather than because they are intrinsically superior models. Figure 11.2 demonstrates the performance of different models on different hardware: notice how (follow the big yellow arrow) the Mobilenet V3 Large model (in green) has the lowest latency among all models when run unquantized on the Pixel4 CPU. At the same time, it performs the worst on Pixel4 DSP Qualcomm Snapdragon 855. Unfortunately, the hardware used is often omitted from papers or only briefly mentioned, making reproducing results difficult, if possible.\n\nChu, Grace, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton, Pieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, and Andrew Howard. 2021. “Discovering Multi-Hardware Mobile Models via Architecture Search.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3022–31. IEEE. https://doi.org/10.1109/cvprw53098.2021.00337.\n\n\n\n\n\n\nFigure 11.2: Hardware Lottery.\n\n\n\nFor instance, certain machine learning models may be designed and optimized to take advantage of the parallel processing capabilities of specific hardware accelerators, such as Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs). As a result, these models might show superior performance when benchmarked on such hardware compared to other models that are not optimized for the hardware.\nFor example, a 2018 paper introduced a new convolutional neural network architecture for image classification that achieved state-of-the-art accuracy on ImageNet. However, the paper only mentioned that the model was trained on 8 GPUs without specifying the model, memory size, or other relevant details. A follow-up study tried to reproduce the results but found that training the same model on commonly available GPUs achieved 10% lower accuracy, even after hyperparameter tuning. The original hardware likely had far higher memory bandwidth and compute power. As another example, training times for large language models can vary drastically based on the GPUs used.\nThe “hardware lottery” can introduce challenges and biases in benchmarking machine learning systems, as the model’s performance is not solely dependent on the model’s architecture or algorithm but also on the compatibility and synergies with the underlying hardware. This can make it difficult to compare different models fairly and to identify the best model based on its intrinsic merits. It can also lead to a situation where the community converges on models that are a good fit for the popular hardware of the day, potentially overlooking other models that might be superior but incompatible with the current hardware trends.\n\n\nBenchmark Engineering\nHardware lottery occurs when a machine learning model unintentionally performs exceptionally well or poorly on a specific hardware setup due to unforeseen compatibility or incompatibility. The model is not explicitly designed or optimized for that particular hardware by the developers or engineers; rather, it happens to align or (mis)align with the hardware’s capabilities or limitations. In this case, the model’s performance on the hardware is a byproduct of coincidence rather than design.\nIn contrast to the accidental hardware lottery, benchmark engineering involves deliberately optimizing or designing a machine learning model to perform exceptionally well on specific hardware, often to win benchmarks or competitions. This intentional optimization might include tweaking the model’s architecture, algorithms, or parameters to exploit the hardware’s features and capabilities fully.\n\n\nProblem\nBenchmark engineering refers to tweaking or modifying an AI system to optimize performance on specific benchmark tests, often at the expense of generalizability or real-world performance. This can include adjusting hyperparameters, training data, or other aspects of the system specifically to achieve high scores on benchmark metrics without necessarily improving the overall functionality or utility of the system.\nThe motivation behind benchmark engineering often stems from the desire to achieve high-performance scores for marketing or competitive purposes. High benchmark scores can demonstrate the superiority of an AI system compared to competitors and can be a key selling point for potential users or investors. This pressure to perform well on benchmarks sometimes leads to prioritizing benchmark-specific optimizations over more holistic improvements to the system.\nIt can lead to several risks and challenges. One of the primary risks is that the AI system may perform better in real-world applications than the benchmark scores suggest. This can lead to user dissatisfaction, reputational damage, and potential safety or ethical concerns. Furthermore, benchmark engineering can contribute to a lack of transparency and accountability in the AI community, as it can be difficult to discern how much of an AI system’s performance is due to genuine improvements versus benchmark-specific optimizations.\nThe AI community must prioritize transparency and accountability to mitigate the risks associated with benchmark engineering. This can include disclosing any optimizations or adjustments made specifically for benchmark tests and providing more comprehensive evaluations of AI systems that include real-world performance metrics and benchmark scores. Researchers and developers must prioritize holistic improvements to AI systems that improve their generalizability and functionality across various applications rather than focusing solely on benchmark-specific optimizations.\n\n\nIssues\nOne of the primary problems with benchmark engineering is that it can compromise the real-world performance of AI systems. When developers focus on optimizing their systems to achieve high scores on specific benchmark tests, they may neglect other important system performance aspects crucial in real-world applications. For example, an AI system designed for image recognition might be engineered to perform exceptionally well on a benchmark test that includes a specific set of images but needs help to recognize images slightly different from those in the test set accurately.\nAnother area for improvement with benchmark engineering is that it can result in AI systems that lack generalizability. In other words, while the system may perform well on the benchmark test, it may need help handling a diverse range of inputs or scenarios. For instance, an AI model developed for natural language processing might be engineered to achieve high scores on a benchmark test that includes a specific type of text but fails to process text that falls outside of that specific type accurately.\nIt can also lead to misleading results. When AI systems are engineered to perform well on benchmark tests, the results may not accurately reflect the system’s true capabilities. This can be problematic for users or investors who rely on benchmark scores to make informed decisions about which AI systems to use or invest in. For example, an AI system engineered to achieve high scores on a benchmark test for speech recognition might need to be more capable of accurately recognizing speech in real-world situations, leading users or investors to make decisions based on inaccurate information.\n\n\nMitigation\nThere are several ways to mitigate benchmark engineering. Transparency in the benchmarking process is crucial to maintaining benchmark accuracy and reliability. This involves clearly disclosing the methodologies, data sets, and evaluation criteria used in benchmark tests, as well as any optimizations or adjustments made to the AI system for the purpose of the benchmark.\nOne way to achieve transparency is through the use of open-source benchmarks. Open-source benchmarks are made publicly available, allowing researchers, developers, and other stakeholders to review, critique, and contribute to them, thereby ensuring their accuracy and reliability. This collaborative approach also facilitates sharing best practices and developing more robust and comprehensive benchmarks.\nOne example is the MLPerf Tiny. It’s an open-source framework designed to make it easy to compare different solutions in the world of TinyML. Its modular design allows components to be swapped out for comparison or improvement. The reference implementations, shown in green and orange in Figure 11.3, act as the baseline for results. TinyML often needs optimization across the entire system, and users can contribute by focusing on specific parts, like quantization. The modular benchmark design allows users to showcase their contributions and competitive advantage by modifying a reference implementation. In short, MLPerf Tiny offers a flexible and modular way to assess and enhance TinyML applications, making it easier to compare and improve different aspects of the technology.\n\n\n\n\n\n\nFigure 11.3: MLPerf Tiny modular design. Credit: Mattson et al. (2020a).\n\n\n———, et al. 2020a. “MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.” IEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\nAnother method for achieving transparency is through peer review of benchmarks. This involves having independent experts review and validate the benchmark’s methodology, data sets, and results to ensure their credibility and reliability. Peer review can provide a valuable means of verifying the accuracy of benchmark tests and help build confidence in the results.\nStandardization of benchmarks is another important solution to mitigate benchmark engineering. Standardized benchmarks provide a common framework for evaluating AI systems, ensuring consistency and comparability across different systems and applications. This can be achieved by developing industry-wide standards and best practices for benchmarking and through common metrics and evaluation criteria.\nThird-party verification of results can also be valuable in mitigating benchmark engineering. This involves having an independent third party verify the results of a benchmark test to ensure their credibility and reliability. Third-party verification can build confidence in the results and provide a valuable means of validating the performance and capabilities of AI systems.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1188,7 +1188,7 @@ "section": "11.5 Model Benchmarking", "text": "11.5 Model Benchmarking\nBenchmarking machine learning models is important for determining the effectiveness and efficiency of various machine learning algorithms in solving specific tasks or problems. By analyzing the results obtained from benchmarking, developers and researchers can identify their models’ strengths and weaknesses, leading to more informed decisions on model selection and further optimization.\nThe evolution and progress of machine learning models are intrinsically linked to the availability and quality of data sets. In machine learning, data acts as the raw material that powers the algorithms, allowing them to learn, adapt, and ultimately perform tasks that were traditionally the domain of humans. Therefore, it is important to understand this history.\n\n11.5.1 Historical Context\nMachine learning datasets have a rich history and have evolved significantly over the years, growing in size, complexity, and diversity to meet the ever-increasing demands of the field. Let’s take a closer look at this evolution, starting from one of the earliest and most iconic datasets – MNIST.\n\nMNIST (1998)\nThe MNIST dataset, created by Yann LeCun, Corinna Cortes, and Christopher J.C. Burges in 1998, can be considered a cornerstone in the history of machine learning datasets. It comprises 70,000 labeled 28x28 pixel grayscale images of handwritten digits (0-9). MNIST has been widely used for benchmarking algorithms in image processing and machine learning as a starting point for many researchers and practitioners. Figure 11.4 shows some examples of handwritten digits.\n\n\n\n\n\n\nFigure 11.4: MNIST handwritten digits. Credit: Suvanjanprasai.\n\n\n\n\n\nImageNet (2009)\nFast forward to 2009, and we see the introduction of the ImageNet dataset, which marked a significant leap in the scale and complexity of datasets. ImageNet consists of over 14 million labeled images spanning more than 20,000 categories. Fei-Fei Li and her team developed it to advance object recognition and computer vision research. The dataset became synonymous with the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual competition crucial in developing deep learning models, including the famous AlexNet in 2012.\n\n\nCOCO (2014)\nThe Common Objects in Context (COCO) dataset(Lin et al. (2014)), released in 2014, further expanded the landscape of machine learning datasets by introducing a richer set of annotations. COCO consists of images containing complex scenes with multiple objects, and each image is annotated with object bounding boxes, segmentation masks, and captions. This dataset has been instrumental in advancing research in object detection, segmentation, and image captioning.\n\nLin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. “Microsoft Coco: Common Objects in Context.” In Computer VisionECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13, 740–55. Springer.\nhttps://cocodataset.org/images/jpg/coco-examples.jpg\n\n\nGPT-3 (2020)\nWhile the above examples primarily focus on image datasets, there have also been significant developments in text datasets. One notable example is GPT-3 (Brown et al. 2020), developed by OpenAI. GPT-3 is a language model trained on diverse internet text. Although the dataset used to train GPT-3 is not publicly available, the model itself, consisting of 175 billion parameters, is a testament to the scale and complexity of modern machine learning datasets and models.\n\nBrown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.\n\n\nPresent and Future\nToday, we have a plethora of datasets spanning various domains, including healthcare, finance, social sciences, and more. The following characteristics help us taxonomize the space and growth of machine learning datasets that fuel model development.\n\nDiversity of Data Sets: The variety of data sets available to researchers and engineers has expanded dramatically, covering many fields, including natural language processing, image recognition, and more. This diversity has fueled the development of specialized machine-learning models tailored to specific tasks, such as translation, speech recognition, and facial recognition.\nVolume of Data: The sheer volume of data that has become available in the digital age has also played a crucial role in advancing machine learning models. Large data sets enable models to capture the complexity and nuances of real-world phenomena, leading to more accurate and reliable predictions.\nQuality and Cleanliness of Data: The quality of data is another critical factor that influences the performance of machine learning models. Clean, well-labeled, and unbiased data sets are essential for training models that are robust and fair.\nOpen Access to Data: The availability of open-access data sets has also contributed significantly to machine learning’s progress. Open data allows researchers from around the world to collaborate, share insights, and build upon each other’s work, leading to faster innovation and the development of more advanced models.\nEthics and Privacy Concerns: As data sets grow in size and complexity, ethical considerations and privacy concerns become increasingly important. There is an ongoing debate about the balance between leveraging data for machine learning advancements and protecting individuals’ privacy rights.\n\nThe development of machine learning models relies heavily on the availability of diverse, large, high-quality, and open-access data sets. As we move forward, addressing the ethical considerations and privacy concerns associated with using large data sets is crucial to ensure that machine learning technologies benefit society. There is a growing awareness that data acts as the rocket fuel for machine learning, driving and fueling the development of machine learning models. Consequently, more focus is being placed on developing the data sets themselves. We will explore this in further detail in the data benchmarking section.\n\n\n\n11.5.2 Model Metrics\nMachine learning model evaluation has evolved from a narrow focus on accuracy to a more comprehensive approach considering a range of factors, from ethical considerations and real-world applicability to practical constraints like model size and efficiency. This shift reflects the field’s maturation as machine learning models are increasingly applied in diverse, complex real-world scenarios.\n\nAccuracy\nAccuracy is one of the most intuitive and commonly used metrics for evaluating machine learning models. At its core, accuracy measures the proportion of correct predictions made by the model out of all predictions. For example, imagine we have developed a machine learning model to classify images as either containing a cat or not. If we test this model on a dataset of 100 images, and it correctly identifies 90 of them, we would calculate its accuracy as 90%.\nIn the initial stages of machine learning, accuracy was often the primary, if not the only, metric considered when evaluating model performance. This is understandable, given its straightforward nature and ease of interpretation. However, as the field has progressed, the limitations of relying solely on accuracy have become more apparent.\nConsider the example of a medical diagnosis model with an accuracy of 95%. While at first glance this may seem impressive, we must delve deeper to assess the model’s performance fully. Suppose the model fails to accurately diagnose severe conditions that, while rare, can have severe consequences; its high accuracy may not be as meaningful. A pertinent example of this is Google’s retinopathy machine learning model, which was designed to diagnose diabetic retinopathy and diabetic macular edema from retinal photographs.\nThe Google model demonstrated impressive accuracy levels in lab settings. Still, when deployed in real-world clinical environments in Thailand, it faced significant challenges. In the real-world setting, the model encountered diverse patient populations, varying image quality, and a range of different medical conditions that it had not been exposed to during its training. Consequently, its performance could have been better, and it struggled to maintain the same accuracy levels observed in lab settings. This example serves as a clear reminder that while high accuracy is an important and desirable attribute for a medical diagnosis model, it must be evaluated in conjunction with other factors, such as the model’s ability to generalize to different populations and handle diverse and unpredictable real-world conditions, to understand its value and potential impact on patient care truly.\nSimilarly, if the model performs well on average but exhibits significant disparities in performance across different demographic groups, this, too, would be cause for concern.\nThe evolution of machine learning has thus seen a shift towards a more holistic approach to model evaluation, taking into account not just accuracy, but also other crucial factors such as fairness, transparency, and real-world applicability. A prime example is the Gender Shades project at MIT Media Lab, led by Joy Buolamwini, highlighting significant racial and gender biases in commercial facial recognition systems. The project evaluated the performance of three facial recognition technologies developed by IBM, Microsoft, and Face++. It found that they all exhibited biases, performing better on lighter-skinned and male faces compared to darker-skinned and female faces.\nWhile accuracy remains a fundamental and valuable metric for evaluating machine learning models, a more comprehensive approach is required to fully assess a model’s performance. This means considering additional metrics that account for fairness, transparency, and real-world applicability, as well as conducting rigorous testing across diverse datasets to uncover and mitigate any potential biases. The move towards a more holistic approach to model evaluation reflects the maturation of the field and its increasing recognition of the real-world implications and ethical considerations associated with deploying machine learning models.\n\n\nFairness\nFairness in machine learning models is a multifaceted and critical aspect that requires careful attention, particularly in high-stakes applications that significantly affect people’s lives, such as in loan approval processes, hiring, and criminal justice. It refers to the equitable treatment of all individuals, irrespective of their demographic or social attributes such as race, gender, age, or socioeconomic status.\nSimply relying on accuracy can be insufficient and potentially misleading when evaluating models. For instance, consider a loan approval model with a 95% accuracy rate. While this figure may appear impressive at first glance, it does not reveal how the model performs across different demographic groups. If this model consistently discriminates against a particular group, its accuracy is less commendable, and its fairness is questioned.\nDiscrimination can manifest in various forms, such as direct discrimination, where a model explicitly uses sensitive attributes like race or gender in its decision-making process, or indirect discrimination, where seemingly neutral variables correlate with sensitive attributes, indirectly influencing the model’s outcomes. An infamous example of the latter is the COMPAS tool used in the US criminal justice system, which exhibited racial biases in predicting recidivism rates despite not explicitly using race as a variable.\nAddressing fairness involves careful examination of the model’s performance across diverse groups, identifying potential biases, and rectifying disparities through corrective measures such as re-balancing datasets, adjusting model parameters, and implementing fairness-aware algorithms. Researchers and practitioners continuously develop metrics and methodologies tailored to specific use cases to evaluate fairness in real-world scenarios. For example, disparate impact analysis, demographic parity, and equal opportunity are some of the metrics employed to assess fairness.\nAdditionally, transparency and interpretability of models are fundamental to achieving fairness. Understanding how a model makes decisions can reveal potential biases and enable stakeholders to hold developers accountable. Open-source tools like AI Fairness 360 by IBM and Fairness Indicators by TensorFlow are being developed to facilitate fairness assessments and mitigation of biases in machine learning models.\nEnsuring fairness in machine learning models, particularly in applications that significantly impact people’s lives, requires rigorous evaluation of the model’s performance across diverse groups, careful identification and mitigation of biases, and implementation of transparency and interpretability measures. By comprehensively addressing fairness, we can work towards developing machine learning models that are equitable, just, and beneficial for society.\n\n\nComplexity\n\nParameters*\nIn the initial stages of machine learning, model benchmarking often relied on parameter counts as a proxy for model complexity. The rationale was that more parameters typically lead to a more complex model, which should, in turn, deliver better performance. However, this approach has proven inadequate as it needs to account for the computational cost associated with processing many parameters.\nFor example, GPT-3, developed by OpenAI, is a language model that boasts an astounding 175 billion parameters. While it achieves state-of-the-art performance on various natural language processing tasks, its size and the computational resources required to run it make it impractical for deployment in many real-world scenarios, especially those with limited computational capabilities.\nRelying on parameter counts as a proxy for model complexity also fails to consider the model’s efficiency. If optimized for efficiency, a model with fewer parameters might be just as effective, if not more so, than a model with a higher parameter count. For instance, MobileNets, developed by Google, is a family of models designed specifically for mobile and edge devices. They utilize depth-wise separable convolutions to reduce the number of parameters and computational costs while still achieving competitive performance.\nIn light of these limitations, the field has moved towards a more holistic approach to model benchmarking that considers parameter counts and other crucial factors such as floating-point operations per second (FLOPs), memory consumption, and latency. FLOPs, in particular, have emerged as an important metric as they provide a more accurate representation of the computational load a model imposes. This shift towards a more comprehensive approach to model benchmarking reflects a recognition of the need to balance performance with practicality, ensuring that models are effective, efficient, and deployable in real-world scenarios.\n\n\nFLOPS\nThe size of a machine learning model is an essential aspect that directly impacts its usability in practical scenarios, especially when computational resources are limited. Traditionally, the number of parameters in a model was often used as a proxy for its size, with the underlying assumption being that more parameters would translate to better performance. However, this simplistic view does not consider the computational cost of processing these parameters. This is where the concept of floating-point operations per second (FLOPs) comes into play, providing a more accurate representation of the computational load a model imposes.\nFLOPs measure the number of floating-point operations a model performs to generate a prediction. A model with many FLOPs requires substantial computational resources to process the vast number of operations, which may render it impractical for certain applications. Conversely, a model with a lower FLOP count is more lightweight and can be easily deployed in scenarios where computational resources are limited.\nLet’s consider an example. BERT Bidirectional Encoder Representations from Transformers, a popular natural language processing model, has over 340 million parameters, making it a large model with high accuracy and impressive performance across various tasks. However, the sheer size of BERT, coupled with its high FLOP count, makes it a computationally intensive model that may not be suitable for real-time applications or deployment on edge devices with limited computational capabilities.\nIn light of this, there has been a growing interest in developing smaller models that can achieve similar performance levels as their larger counterparts while being more efficient in computational load. DistilBERT, for instance, is a smaller version of BERT that retains 97% of its performance while being 40% smaller in terms of parameter count. The size reduction also translates to a lower FLOP count, making DistilBERT a more practical choice for resource-constrained scenarios.\nIn summary, while parameter count provides a useful indication of model size, it is not a comprehensive metric as it needs to consider the computational cost associated with processing these parameters. FLOPs, on the other hand, offer a more accurate representation of a model’s computational load and are thus an essential consideration when deploying machine learning models in real-world scenarios, particularly when computational resources are limited. The evolution from relying solely on parameter count to considering FLOPs signifies a maturation in the field, reflecting a greater awareness of the practical constraints and challenges of deploying machine learning models in diverse settings.\n\n\nEfficiency\nEfficiency metrics, such as memory consumption and latency/throughput, have also gained prominence. These metrics are particularly crucial when deploying models on edge devices or in real-time applications, as they measure how quickly a model can process data and how much memory it requires. In this context, Pareto curves are often used to visualize the trade-off between different metrics, helping stakeholders decide which model best suits their needs.\n\n\n\n\n11.5.3 Lessons Learned\nModel benchmarking has offered us several valuable insights that can be leveraged to drive innovation in system benchmarks. The progression of machine learning models has been profoundly influenced by the advent of leaderboards and the open-source availability of models and datasets. These elements have served as significant catalysts, propelling innovation and accelerating the integration of cutting-edge models into production environments. However, as we will explore further, these are not the only contributors to the development of machine learning benchmarks.\nLeaderboards play a vital role in providing an objective and transparent method for researchers and practitioners to evaluate the efficacy of different models, ranking them based on their performance in benchmarks. This system fosters a competitive environment, encouraging the development of models that are not only accurate but also efficient. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a prime example of this, with its annual leaderboard significantly contributing to developing groundbreaking models such as AlexNet.\nOpen-source access to state-of-the-art models and datasets further democratizes machine learning, facilitating collaboration among researchers and practitioners worldwide. This open access accelerates the process of testing, validation, and deployment of new models in production environments, as evidenced by the widespread adoption of models like BERT and GPT-3 in various applications, from natural language processing to more complex, multi-modal tasks.\nCommunity collaboration platforms like Kaggle have revolutionized the field by hosting competitions that unite data scientists from across the globe to solve intricate problems. Specific benchmarks serve as the goalposts for innovation and model development.\nMoreover, the availability of diverse and high-quality datasets is paramount in training and testing machine learning models. Datasets such as ImageNet have played an instrumental role in the evolution of image recognition models, while extensive text datasets have facilitated advancements in natural language processing models.\nLastly, the contributions of academic and research institutions must be supported. Their role in publishing research papers, sharing findings at conferences, and fostering collaboration between various institutions has significantly contributed to advancing machine learning models and benchmarks.\n\nEmerging Trends\nAs machine learning models become more sophisticated, so do the benchmarks required to assess them accurately. There are several emerging benchmarks and datasets that are gaining popularity due to their ability to evaluate models in more complex and realistic scenarios:\nMultimodal Datasets: These datasets contain multiple data types, such as text, images, and audio, to represent real-world situations better. An example is the VQA (Visual Question Answering) dataset (Antol et al. 2015), where models’ ability to answer text-based questions about images is tested.\n\nAntol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. “VQA: Visual Question Answering.” In 2015 IEEE International Conference on Computer Vision (ICCV), 2425–33. IEEE. https://doi.org/10.1109/iccv.2015.279.\nFairness and Bias Evaluation: There is an increasing focus on creating benchmarks assessing machine learning models’ fairness and bias. Examples include the AI Fairness 360 toolkit, which offers a comprehensive set of metrics and datasets for evaluating bias in models.\nOut-of-Distribution Generalization: Testing how well models perform on data different from the original training distribution. This evaluates the model’s ability to generalize to new, unseen data. Example benchmarks are Wilds (Koh et al. 2021), RxRx, and ANC-Bench.\n\nKoh, Pang Wei, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, et al. 2021. “WILDS: A Benchmark of in-the-Wild Distribution Shifts.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:5637–64. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/koh21a.html.\n\nHendrycks, Dan, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. 2021. “Natural Adversarial Examples.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15262–71. IEEE. https://doi.org/10.1109/cvpr46437.2021.01501.\n\nXie, Cihang, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and Quoc V. Le. 2020. “Adversarial Examples Improve Image Recognition.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 816–25. IEEE. https://doi.org/10.1109/cvpr42600.2020.00090.\nAdversarial Robustness: Evaluating model performance under adversarial attacks or perturbations to the input data. This tests the model’s robustness. Example benchmarks are ImageNet-A (Hendrycks et al. 2021), ImageNet-C (Xie et al. 2020), and CIFAR-10.1.\nReal-World Performance: Testing models on real-world datasets that closely match end tasks rather than just canned benchmark datasets. Examples are medical imaging datasets for healthcare tasks or customer support chat logs for dialogue systems.\nEnergy and Compute Efficiency: Benchmarks that measure the computational resources required to achieve a particular accuracy. This evaluates the model’s Efficiency. Examples are MLPerf and Greenbench, already discussed in the Systems benchmarking section.\nInterpretability and Explainability: Benchmarks that assess how easy it is to understand and explain a model’s internal logic and predictions. Example metrics are faithfulness to input gradients and coherence of explanations.\n\n\n\n11.5.4 Limitations and Challenges\nWhile model benchmarks are an essential tool in assessing machine learning models, several limitations and challenges should be addressed to ensure that they accurately reflect a model’s performance in real-world scenarios.\nDataset does not Correspond to Real-World Scenarios: Often, the data used in model benchmarks is cleaned and preprocessed to such an extent that it may need to accurately represent the data that a model would encounter in real-world applications. This idealized data version can lead to overestimating a model’s performance. In the case of the ImageNet dataset, the images are well-labeled and categorized. Still, in a real-world scenario, a model may need to deal with blurry images that could be better lit or taken from awkward angles. This discrepancy can significantly affect the model’s performance.\nSim2Real Gap: The Sim2Real gap refers to the difference in the performance of a model when transitioning from a simulated environment to a real-world environment. This gap is often observed in robotics, where a robot trained in a simulated environment struggles to perform tasks in the real world due to the complexity and unpredictability of real-world environments. A robot trained to pick up objects in a simulated environment may need help to perform the same task in the real world because the simulated environment does not accurately represent the complexities of real-world physics, lighting, and object variability.\nChallenges in Creating Datasets: Creating a dataset for model benchmarking is a challenging task that requires careful consideration of various factors such as data quality, diversity, and representation. As discussed in the data engineering section, ensuring that the data is clean, unbiased, and representative of the real-world scenario is crucial for the accuracy and reliability of the benchmark. For example, when creating a dataset for a healthcare-related task, it is important to ensure that the data is representative of the entire population and not biased towards a particular demographic. This ensures that the model performs well across diverse patient populations.\nModel benchmarks are essential in measuring the capability of a model architecture in solving a fixed task, but it is important to address the limitations and challenges associated with them. This includes ensuring that the dataset accurately represents real-world scenarios, addressing the Sim2Real gap, and overcoming the challenges of creating unbiased and representative datasets. By addressing these challenges and many others, we can ensure that model benchmarks provide a more accurate and reliable assessment of a model’s performance in real-world applications.\nThe Speech Commands dataset and its successor MSWC, are common benchmarks for one of the quintessential TinyML applications, keyword spotting. Speech commands establish streaming error metrics beyond the standard top-1 classification accuracy more relevant to the keyword spotting use case. Using case-relevant metrics is what elevates a dataset to a model benchmark.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1199,7 +1199,7 @@ "section": "11.6 Data Benchmarking", "text": "11.6 Data Benchmarking\nFor the past several years, AI has focused on developing increasingly sophisticated machine learning models like large language models. The goal has been to create models capable of human-level or superhuman performance on a wide range of tasks by training them on massive datasets. This model-centric approach produced rapid progress, with models attaining state-of-the-art results on many established benchmarks. Figure 11.5 shows the performance of AI systems relative to human performance (marked by the horizontal line at 0) across five applications: handwriting recognition, speech recognition, image recognition, reading comprehension, and language understanding. Over the past decade, the AI performance has surpassed that of humans.\nHowever, growing concerns about issues like bias, safety, and robustness persist even in models that achieve high accuracy on standard benchmarks. Additionally, some popular datasets used for evaluating models are beginning to saturate, with models reaching near-perfect performance on existing test splits (Kiela et al. 2021). As a simple example, there are test images in the classic MNIST handwritten digit dataset that may look indecipherable to most human evaluators but were assigned a label when the dataset was created - models that happen to agree with those labels may appear to exhibit superhuman performance but instead may only be capturing idiosyncrasies of the labeling and acquisition process from the dataset’s creation in 1994. In the same spirit, computer vision researchers now ask, “Are we done with ImageNet?” (Beyer et al. 2020). This highlights limitations in the conventional model-centric approach of optimizing accuracy on fixed datasets through architectural innovations.\n\nBeyer, Lucas, Olivier J Hénaff, Alexander Kolesnikov, Xiaohua Zhai, and Aäron van den Oord. 2020. “Are We Done with Imagenet?” ArXiv Preprint abs/2006.07159. https://arxiv.org/abs/2006.07159.\n\n\n\n\n\n\nFigure 11.5: AI vs human performane. Credit: Kiela et al. (2021).\n\n\nKiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, et al. 2021. “Dynabench: Rethinking Benchmarking in NLP.” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4110–24. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.324.\n\n\nAn alternative paradigm is emerging called data-centric AI. Rather than treating data as static and focusing narrowly on model performance, this approach recognizes that models are only as good as their training data. So, the emphasis shifts to curating high-quality datasets that better reflect real-world complexity, developing more informative evaluation benchmarks, and carefully considering how data is sampled, preprocessed, and augmented. The goal is to optimize model behavior by improving the data rather than just optimizing metrics on flawed datasets. Data-centric AI critically examines and enhances the data itself to produce beneficial AI. This reflects an important evolution in mindset as the field addresses the shortcomings of narrow benchmarking.\nThis section will explore the key differences between model-centric and data-centric approaches to AI. This distinction has important implications for how we benchmark AI systems. Specifically, we will see how focusing on data quality and Efficiency can directly improve machine learning performance as an alternative to optimizing model architectures solely. The data-centric approach recognizes that models are only as good as their training data. So, enhancing data curation, evaluation benchmarks, and data handling processes can produce AI systems that are safer, fairer, and more robust. Rethinking benchmarking to prioritize data alongside models represents an important evolution as the field aims to deliver trustworthy real-world impact.\n\n11.6.1 Limitations of Model-Centric AI\nIn the model-centric AI era, a prominent characteristic was the development of complex model architectures. Researchers and practitioners dedicated substantial effort to devising sophisticated and intricate models in the quest for superior performance. This frequently involved the incorporation of additional layers and the fine-tuning of a multitude of hyperparameters to achieve incremental improvements in accuracy. Concurrently, there was a significant emphasis on leveraging advanced algorithms. These algorithms, often at the forefront of the latest research, were employed to enhance the performance of AI models. The primary aim of these algorithms was to optimize the learning process of models, thereby extracting maximal information from the training data.\nWhile the model-centric approach has been central to many advancements in AI, it has several areas for improvement. First, the development of complex model architectures can often lead to overfitting. This is when the model performs well on the training data but needs to generalize to new, unseen data. The additional layers and complexity can capture noise in the training data as if it were a real pattern, harming the model’s performance on new data.\nSecond, relying on advanced algorithms can sometimes obscure the real understanding of a model’s functioning. These algorithms often act as a black box, making it difficult to interpret how the model is making decisions. This lack of transparency can be a significant hurdle, especially in critical applications such as healthcare and finance, where understanding the model’s decision-making process is crucial.\nThird, the emphasis on achieving state-of-the-art results on benchmark datasets can sometimes be misleading. These datasets need to represent the complexities and variability of real-world data more fully. A model that performs well on a benchmark dataset may not necessarily generalize well to new, unseen data in a real-world application. This discrepancy can lead to false confidence in the model’s capabilities and hinder its practical applicability.\nLastly, the model-centric approach often relies on large labeled datasets for training. However, obtaining such datasets takes time and effort in many real-world scenarios. This reliance on large datasets also limits AI’s applicability in domains where data is scarce or expensive to label.\nAs a result of the above reasons, and many more, the AI community is shifting to a more data-centric approach. Rather than focusing just on model architecture, researchers are now prioritizing curating high-quality datasets, developing better evaluation benchmarks, and considering how data is sampled and preprocessed. The key idea is that models are only as good as their training data. So, focusing on getting the right data will allow us to develop AI systems that are more fair, safe, and aligned with human values. This data-centric shift represents an important change in mindset as AI progresses.\n\n\n11.6.2 The Shift Toward Data-centric AI\nData-centric AI is a paradigm that emphasizes the importance of high-quality, well-labeled, and diverse datasets in developing AI models. In contrast to the model-centric approach, which focuses on refining and iterating on the model architecture and algorithm to improve performance, data-centric AI prioritizes the quality of the input data as the primary driver of improved model performance. High-quality data is clean, well-labeled and representative of the real-world scenarios the model will encounter. In contrast, low-quality data can lead to poor model performance, regardless of the complexity or sophistication of the model architecture.\nData-centric AI puts a strong emphasis on the cleaning and labeling of data. Cleaning involves the removal of outliers, handling missing values, and addressing other data inconsistencies. Labeling, on the other hand, involves assigning meaningful and accurate labels to the data. Both these processes are crucial in ensuring that the AI model is trained on accurate and relevant data. Another important aspect of the data-centric approach is data augmentation. This involves artificially increasing the size and diversity of the dataset by applying various transformations to the data, such as rotation, scaling, and flipping training images. Data augmentation helps in improving the model’s robustness and generalization capabilities.\nThere are several benefits to adopting a data-centric approach to AI development. First and foremost, it leads to improved model performance and generalization capabilities. By ensuring that the model is trained on high-quality, diverse data, the model can better generalize to new, unseen data (Mattson et al. 2020b).\nAdditionally, a data-centric approach can often lead to simpler models that are easier to interpret and maintain. This is because the emphasis is on the data rather than the model architecture, meaning simpler models can achieve high performance when trained on high-quality data.\nThe shift towards data-centric AI represents a significant paradigm shift. By prioritizing the quality of the input data, this approach aims to improve model performance and generalization capabilities, ultimately leading to more robust and reliable AI systems. As we continue to advance in our understanding and application of AI, the data-centric approach is likely to play an important role in shaping the future of this field.\n\n\n11.6.3 Benchmarking Data\nData benchmarking aims to evaluate common issues in datasets, such as identifying label errors, noisy features, representation imbalance (for example, out of the 1000 classes in Imagenet-1K, there are over 100 categories which are just types of dogs), class imbalance (where some classes have many more samples than others), whether models trained on a given dataset can generalize to out-of-distribution features, or what types of biases might exist in a given dataset (Mattson et al. 2020b). In its simplest form, data benchmarking aims to improve accuracy on a test set by removing noisy or mislabeled training samples while keeping the model architecture fixed. Recent competitions in data benchmarking have invited participants to submit novel augmentation strategies and active learning techniques.\n\nMattson, Peter, Vijay Janapa Reddi, Christine Cheng, Cody Coleman, Greg Diamos, David Kanter, Paulius Micikevicius, et al. 2020b. “MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.” IEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\nData-centric techniques continue to gain attention in benchmarking, especially as foundation models are increasingly trained on self-supervised objectives. Compared to smaller datasets like Imagenet-1K, massive datasets commonly used in self-supervised learning, such as Common Crawl, OpenImages, and LAION-5B, contain higher amounts of noise, duplicates, bias, and potentially offensive data.\nDataComp is a recently launched dataset competition that targets the evaluation of large corpora. DataComp focuses on language-image pairs used to train CLIP models. The introductory whitepaper finds that when the total compute budget for training is constant, the best-performing CLIP models on downstream tasks, such as ImageNet classification, are trained on just 30% of the available training sample pool. This suggests that proper filtering of large corpora is critical to improving the accuracy of foundation models. Similarly, Demystifying CLIP Data (Xu et al. 2023) asks whether the success of CLIP is attributable to the architecture or the dataset.\n\nXu, Hu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, and Christoph Feichtenhofer. 2023. “Demystifying CLIP Data.” ArXiv Preprint abs/2309.16671. https://arxiv.org/abs/2309.16671.\nDataPerf is another recent effort focusing on benchmarking data in various modalities. DataPerf provides rounds of online competition to spur improvement in datasets. The inaugural offering launched with challenges in vision, speech, acquisition, debugging, and text prompting for image generation.\n\n\n11.6.4 Data Efficiency\nAs machine learning models grow larger and more complex and compute resources become more scarce in the face of rising demand, it becomes challenging to meet the computation requirements even with the largest machine learning fleets. To overcome these challenges and ensure machine learning system scalability, it is necessary to explore novel opportunities that augment conventional approaches to resource scaling.\nImproving data quality can be a useful method to impact machine learning system performance significantly. One of the primary benefits of enhancing data quality is the potential to reduce the size of the training dataset while still maintaining or even improving model performance. This data size reduction directly relates to the amount of training time required, thereby allowing models to converge more quickly and efficiently. Achieving this balance between data quality and dataset size is a challenging task that requires the development of sophisticated methods, algorithms, and techniques.\nSeveral approaches can be taken to improve data quality. These methods include and are not limited to the following:\n\nData Cleaning: This involves handling missing values, correcting errors, and removing outliers. Clean data ensures that the model is not learning from noise or inaccuracies.\nData Interpretability and Explainability: Common techniques include LIME (Ribeiro, Singh, and Guestrin 2016), which provides insight into the decision boundaries of classifiers, and Shapley values (Lundberg and Lee 2017), which estimate the importance of individual samples in contributing to a model’s predictions.\nFeature Engineering: Transforming or creating new features can significantly improve model performance by providing more relevant information for learning.\nData Augmentation: Augmenting data by creating new samples through various transformations can help improve model robustness and generalization.\nActive Learning: This is a semi-supervised learning approach where the model actively queries a human oracle to label the most informative samples (Coleman et al. 2022). This ensures that the model is trained on the most relevant data.\nDimensionality Reduction: Techniques like PCA can reduce the number of features in a dataset, thereby reducing complexity and training time.\n\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “” Why Should i Trust You?” Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.\n\nLundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, 4765–74. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.\n\nColeman, Cody, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter Bailis, Alexander C. Berg, Robert D. Nowak, Roshan Sumbaly, Matei Zaharia, and I. Zeki Yalniz. 2022. “Similarity Search for Efficient Active Learning and Search of Rare Concepts.” In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, the Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, 6402–10. AAAI Press. https://ojs.aaai.org/index.php/AAAI/article/view/20591.\nThere are many other methods in the wild. But the goal is the same. Refining the dataset and ensuring it is of the highest quality can reduce the training time required for models to converge. However, achieving this requires developing and implementing sophisticated methods, algorithms, and techniques that can clean, preprocess, and augment data while retaining the most informative samples. This is an ongoing challenge that will require continued research and innovation in the field of machine learning.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1210,7 +1210,7 @@ "section": "11.7 The Trifecta", "text": "11.7 The Trifecta\nWhile system, model, and data benchmarks have traditionally been studied in isolation, there is a growing recognition that to understand and advance AI fully, we must take a more holistic view. By iterating between benchmarking systems, models, and datasets together, novel insights that are not apparent when these components are analyzed separately may emerge. System performance impacts model accuracy, model capabilities drive data needs, and data characteristics shape system requirements.\nBenchmarking the triad of system, model, and data in an integrated fashion will likely lead to discoveries about the co-design of AI systems, the generalization properties of models, and the role of data curation and quality in enabling performance. Rather than narrow benchmarks of individual components, the future of AI requires benchmarks that evaluate the symbiotic relationship between computing platforms, algorithms, and training data. This systems-level perspective will be critical to overcoming current limitations and unlocking the next level of AI capabilities.\nFigure 11.6 illustrates the many potential ways to interplay data benchmarking, model benchmarking, and system infrastructure benchmarking together. Exploring these intricate interactions is likely to uncover new optimization opportunities and enhancement capabilities. The data, model, and system benchmark triad offers a rich space for co-design and co-optimization.\n\n\n\n\n\n\nFigure 11.6: Benchmarking trifecta.\n\n\n\nWhile this integrated perspective represents an emerging trend, the field has much more to discover about the synergies and trade-offs between these components. As we iteratively benchmark combinations of data, models, and systems, new insights that remain hidden when these elements are studied in isolation will emerge. This multifaceted benchmarking approach charting the intersections of data, algorithms, and hardware promises to be a fruitful avenue for major progress in AI, even though it is still in its early stages.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1221,7 +1221,7 @@ "section": "11.8 Benchmarks for Emerging Technologies", "text": "11.8 Benchmarks for Emerging Technologies\nGiven their significant differences from existing techniques, emerging technologies can be particularly challenging to design benchmarks for. Standard benchmarks used for existing technologies may not highlight the key features of the new approach. In contrast, new benchmarks may be seen as contrived to favor the emerging technology over others. They may be so different from existing benchmarks that they cannot be understood and lose insightful value. Thus, benchmarks for emerging technologies must balance fairness, applicability, and ease of comparison with existing benchmarks.\nAn example of emerging technology where benchmarking has proven to be especially difficult is in Neuromorphic Computing. Using the brain as a source of inspiration for scalable, robust, and energy-efficient general intelligence, neuromorphic computing (Schuman et al. 2022) directly incorporates biologically realistic mechanisms in both computing algorithms and hardware, such as spiking neural networks (Maass 1997) and non-von Neumann architectures for executing them (Davies et al. 2018; Modha et al. 2023). From a full-stack perspective of models, training techniques, and hardware systems, neuromorphic computing differs from conventional hardware and AI. Thus, there is a key challenge in developing fair and useful benchmarks for guiding the technology.\n\nSchuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker Mitchell, Prasanna Date, and Bill Kay. 2022. “Opportunities for Neuromorphic Computing Algorithms and Applications.” Nature Computational Science 2 (1): 10–19. https://doi.org/10.1038/s43588-021-00184-y.\n\nMaass, Wolfgang. 1997. “Networks of Spiking Neurons: The Third Generation of Neural Network Models.” Neural Networks 10 (9): 1659–71. https://doi.org/10.1016/s0893-6080(97)00011-7.\n\nDavies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, et al. 2018. “Loihi: A Neuromorphic Manycore Processor with on-Chip Learning.” IEEE Micro 38 (1): 82–99. https://doi.org/10.1109/mm.2018.112130359.\n\nModha, Dharmendra S., Filipp Akopyan, Alexander Andreopoulos, Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, et al. 2023. “Neural Inference at the Frontier of Energy, Space, and Time.” Science 382 (6668): 329–35. https://doi.org/10.1126/science.adh1174.\n\nYik, Jason, Soikat Hasan Ahmed, Zergham Ahmed, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu, et al. 2023. “NeuroBench: Advancing Neuromorphic Computing Through Collaborative, Fair and Representative Benchmarking.” https://arxiv.org/abs/2304.04640.\nAn ongoing initiative to develop standard neuromorphic benchmarks is NeuroBench (Yik et al. 2023). To suitably benchmark neuromorphic, NeuroBench follows high-level principles of inclusiveness through task and metric applicability to both neuromorphic and non-neuromorphic solutions, actionability of implementation using common tooling, and iterative updates to continue to ensure relevance as the field rapidly grows. NeuroBench and other benchmarks for emerging technologies provide critical guidance for future techniques, which may be necessary as the scaling limits of existing approaches draw nearer.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1232,7 +1232,7 @@ "section": "11.9 Conclusion", "text": "11.9 Conclusion\nWhat gets measured gets improved. This chapter has explored the multifaceted nature of benchmarking spanning systems, models, and data. Benchmarking is important to advancing AI by providing the essential measurements to track progress.\nML system benchmarks enable optimization across speed, Efficiency, and scalability metrics. Model benchmarks drive innovation through standardized tasks and metrics beyond accuracy. Data benchmarks highlight issues of quality, balance, and representation.\nImportantly, evaluating these components in isolation has limitations. In the future, more integrated benchmarking will likely be used to explore the interplay between system, model, and data benchmarks. This view promises new insights into co-designing data, algorithms, and infrastructure.\nAs AI grows more complex, comprehensive benchmarking becomes even more critical. Standards must continuously evolve to measure new capabilities and reveal limitations. Close collaboration between industry, academics, national labels, etc., is essential to developing benchmarks that are rigorous, transparent, and socially beneficial.\nBenchmarking provides the compass to guide progress in AI. By persistently measuring and openly sharing results, we can navigate toward performant, robust, and trustworthy systems. If AI is to serve societal and human needs properly, it must be benchmarked with humanity’s best interests in mind. To this end, there are emerging areas, such as benchmarking the safety of AI systems, but that’s for another day and something we can discuss further in Generative AI!\nBenchmarking is a continuously evolving topic. The article The Olympics of AI: Benchmarking Machine Learning Systems covers several emerging subfields in AI benchmarking, including robotics, extended reality, and neuromorphic computing that we encourage the reader to pursue.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1241,9 +1241,9 @@ "href": "contents/benchmarking/benchmarking.html#sec-benchmarking-ai-resource", "title": "11  Benchmarking AI", "section": "11.10 Resources", - "text": "11.10 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nWhy is benchmarking important?\nEmbedded inference benchmarking.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 11.1\nExercise 11.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "11.10 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nWhy is benchmarking important?\nEmbedded inference benchmarking.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 11.1\nExercise 11.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Deployment", "11  Benchmarking AI" ] }, @@ -1254,7 +1254,7 @@ "section": "", "text": "12.1 Introduction\nOn-device Learning refers to training ML models directly on the device where they are deployed, as opposed to traditional methods where models are trained on powerful servers and then deployed to devices. This method is particularly relevant to TinyML, where ML systems are integrated into tiny, resource-constrained devices.\nAn example of On-Device Learning can be seen in a smart thermostat that adapts to user behavior over time. Initially, the thermostat may have a generic model that understands basic usage patterns. However, as it is exposed to more data, such as the times the user is home or away, preferred temperatures, and external weather conditions, the thermostat can refine its model directly on the device to provide a personalized experience. This is all done without sending data back to a central server for processing.\nAnother example is in predictive text on smartphones. As users type, the phone learns from the user’s language patterns and suggests words or phrases that are likely to be used next. This learning happens directly on the device, and the model updates in real-time as more data is collected. A widely used real-world example of on-device learning is Gboard. On an Android phone, Gboard learns from typing and dictation patterns to enhance the experience for all users. On-device learning is also called federated learning. Figure 12.1 shows the cycle of federated learning on mobile devices: A. the device learns from user patterns; B. local model updates are communicated to the cloud; C. the cloud server updates the global model and sends the new model to all the devices.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1265,7 +1265,7 @@ "section": "", "text": "Figure 12.1: Federated learning cycle. Credit: Google Research.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1276,7 +1276,7 @@ "section": "12.2 Advantages and Limitations", "text": "12.2 Advantages and Limitations\nOn-device learning provides several advantages over traditional cloud-based ML. By keeping data and models on the device, it eliminates the need for costly data transmission and addresses privacy concerns. This allows for more personalized, responsive experiences, as the model can adapt in real-time to user behavior.\nHowever, On-Device Learning also comes with tradeoffs. The limited computing resources on consumer devices can make it challenging to run complex models locally. Datasets are also more restricted since they consist only of user-generated data from a single device. Additionally, updating models requires pushing out new versions rather than seamless cloud updates.\nOn-device learning opens up new capabilities by enabling offline AI while maintaining user privacy. However, it requires carefully managing model and data complexity within the constraints of consumer devices. Finding the right balance between localization and cloud offloading is key to optimizing on-device experiences.\n\n12.2.1 Benefits\n\nPrivacy and Data Security\nOne of the significant advantages of on-device learning is the enhanced privacy and security of user data. For instance, consider a smartwatch that monitors sensitive health metrics such as heart rate and blood pressure. By processing data and adapting models directly on the device, the biometric data remains localized, circumventing the need to transmit raw data to cloud servers where it could be susceptible to breaches.\nServer breaches are far from rare, with millions of records compromised annually. For example, the 2017 Equifax breach exposed the personal data of 147 million people. By keeping data on the device, the risk of such exposures is drastically minimized. On-device learning eliminates reliance on centralized cloud storage and safeguards against unauthorized access from various threats, including malicious actors, insider threats, and accidental exposure.\nRegulations like the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR) mandate stringent data privacy requirements that on-device learning adeptly addresses. By ensuring data remains localized and is not transferred to other systems, on-device learning facilitates compliance with these regulations.\nOn-device learning is not just beneficial for individual users; it has significant implications for organizations and sectors dealing with highly sensitive data. For instance, within the military, on-device learning empowers frontline systems to adapt models and function independently of connections to central servers that could potentially be compromised. Critical and sensitive information is staunchly protected by localizing data processing and learning. However, this comes with the tradeoff that individual devices take on more value and may incentivize theft or destruction as they become the sole carriers of specialized AI models. Care must be taken to secure devices themselves when transitioning to on-device learning.\nIt is also important to preserve the privacy, security, and regulatory compliance of personal and sensitive data. Instead of in the cloud, training and operating models locally substantially augment privacy measures, ensuring that user data is safeguarded from potential threats.\nHowever, this is only partially intuitive because on-device learning could instead open systems up to new privacy attacks. With valuable data summaries and model updates permanently stored on individual devices, it may be much harder to physically and digitally protect them than a large computing cluster. While on-device learning reduces the amount of data compromised in any one breach, it could also introduce new dangers by dispersing sensitive information across many decentralized endpoints. Careful security practices are still essential for on-device systems.\n\n\nRegulatory Compliance\nOn-device learning helps address major privacy regulations like (GDPR) and CCPA. These regulations require data localization, restricting cross-border data transfers to approved countries with adequate controls. GDPR also mandates privacy by design and consent requirements for data collection. By keeping data processing and model training localized on-device, sensitive user data is not transferred across borders. This avoids major compliance headaches for organizations.\nFor example, a healthcare provider monitoring patient vitals with wearables must ensure cross-border data transfers comply with HIPAA and GDPR if using the cloud. Determining which country’s laws apply and securing approvals for international data flows introduces legal and engineering burdens. With on-device learning, no data leaves the device, simplifying compliance. The time and resources spent on compliance are reduced significantly.\nIndustries like healthcare, finance, and government, which have highly regulated data, can benefit greatly from on-device learning. By localizing data and learning, regulatory privacy and data sovereignty requirements are more easily met. On-device solutions provide an efficient way to build compliant AI applications.\nMajor privacy regulations impose restrictions on cross-border data movement that on-device learning inherently addresses through localized processing. This reduces the compliance burden for organizations working with regulated data.\n\n\nReduced Bandwidth, Costs, and Increased Efficiency\nOne major advantage of on-device learning is the significant reduction in bandwidth usage and associated cloud infrastructure costs. By keeping data localized for model training rather than transmitting raw data to the cloud, on-device learning can result in substantial bandwidth savings. For instance, a network of cameras analyzing video footage can achieve significant reductions in data transfer by training models on-device rather than streaming all video footage to the cloud for processing.\nThis reduction in data transmission saves bandwidth and translates to lower costs for servers, networking, and data storage in the cloud. Large organizations, which might spend millions on cloud infrastructure to train models on-device data, can experience dramatic cost reductions through on-device learning. In the era of Generative AI, where costs have been escalating significantly, finding ways to keep expenses down has become increasingly important.\nFurthermore, the energy and environmental costs of running large server farms are also diminished. Data centers consume vast amounts of energy, contributing to greenhouse gas emissions. By reducing the need for extensive cloud-based infrastructure, on-device learning plays a part in mitigating the environmental impact of data processing (Wu et al. 2022).\n\nWu, Carole-Jean, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, et al. 2022. “Sustainable Ai: Environmental Implications, Challenges and Opportunities.” Proceedings of Machine Learning and Systems 4: 795–813.\nSpecifically for endpoint applications, on-device learning minimizes the number of network API calls needed to run inference through a cloud provider. The cumulative costs associated with bandwidth and API calls can quickly escalate for applications with millions of users. In contrast, performing training and inferences locally is considerably more efficient and cost-effective. Under state-of-the-art optimizations, on-device learning has been shown to reduce training memory requirements, drastically improve memory efficiency, and reduce up to 20% in per-iteration latency (Dhar et al. 2021).\nAnother key benefit of on-device learning is the potential for IoT devices to continuously adapt their ML model to new data for continuous, lifelong learning. On-device models can quickly become outdated as user behavior, data patterns, and preferences change. Continuous learning enables the model to efficiently adapt to new data and improvements and maintain high model performance over time.\n\n\n\n12.2.2 Limitations\nWhile traditional cloud-based ML systems have access to nearly endless computing resources, on-device learning is often restricted by the limitations in computational and storage power of the edge device that the model is trained on. By definition, an edge device is a device with restrained computing, memory, and energy resources that cannot be easily increased or decreased. Thus, the reliance on edge devices can restrict the complexity, efficiency, and size of on-device ML models.\n\nCompute resources\nTraditional cloud-based ML systems utilize large servers with multiple high-end GPUs or TPUs, providing nearly endless computational power and memory. For example, services like Amazon Web Services (AWS) EC2 allow configuring clusters of GPU instances for massively parallel training.\nIn contrast, on-device learning is restricted by the hardware limitations of the edge device on which it runs. Edge devices refer to endpoints like smartphones, embedded electronics, and IoT devices. By definition, these devices have highly restrained computing, memory, and energy resources compared to the cloud.\nFor example, a typical smartphone or Raspberry Pi may only have a few CPU cores, a few GB of RAM, and a small battery. Even more resource-constrained are TinyML microcontroller devices such as the Arduino Nano BLE Sense. The resources are fixed on these devices and can’t easily be increased on demand, such as scaling cloud infrastructure. This reliance on edge devices directly restricts the complexity, efficiency, and size of models that can be deployed for on-device training:\n\nComplexity: Limits on memory, computing, and power restrict model architecture design, constraining the number of layers and parameters.\nEfficiency: Models must be heavily optimized through methods like quantization and pruning to run faster and consume less energy.\nSize: Actual model files must be compressed as much as possible to fit within the storage limitations of edge devices.\n\nThus, while the cloud offers endless scalability, on-device learning must operate within the tight resource constraints of endpoint hardware. This requires careful codesign of streamlined models, training methods, and optimizations tailored specifically for edge devices.\n\n\nDataset Size, Accuracy, and Generalization\nIn addition to limited computing resources, on-device learning is also constrained by the dataset available for training models.\nIn the cloud, models are trained on massive, diverse datasets like ImageNet or Common Crawl. For example, ImageNet contains over 14 million images carefully categorized across thousands of classes.\nOn-device learning instead relies on smaller, decentralized data silos unique to each device. A smartphone camera roll may contain only thousands of photos of users’ interests and environments.\nThis decentralized data leads to a need for IID (independent and identically distributed) data. For instance, two friends may take many photos of the same places and objects, meaning their data distributions are highly correlated rather than independent.\nReasons data may be non-IID in on-device settings:\n\nUser heterogeneity: Different users have different interests and environments.\nDevice differences: Sensors, regions, and demographics affect data.\nTemporal effects: time of day, seasonal impacts on data.\n\nThe effectiveness of ML relies heavily on large, diverse training data. With small, localized datasets, on-device models may fail to generalize across different user populations and environments. For example, a disease detection model trained only on images from a single hospital would not generalize well to other patient demographics. Withel’s real-world performance would only improve with extensive, diverse medical improvement. Thus, while cloud-based learning leverages massive datasets, on-device learning relies on much smaller, decentralized data silos unique to each user.\nThe limited data and optimizations required for on-device learning can negatively impact model accuracy and generalization:\n\nSmall datasets increase overfitting risk. For example, a fruit classifier trained on 100 images risks overfitting compared to one trained on 1 million diverse images.\nNoisy user-generated data reduces quality. Sensor noise or improper data labeling by non-experts may degrade training.\nOptimizations like pruning and quantization trade off accuracy for efficiency. An 8-bit quantized model runs faster but less accurately than a 32-bit model.\n\nSo while cloud models achieve high accuracy with massive datasets and no constraints, on-device models can struggle to generalize. Some studies show that on-device training matches cloud accuracy on select tasks. However, performance on real-world workloads requires further study (Lin et al. 2022).\nFor instance, a cloud model can accurately detect pneumonia in chest X-rays from thousands of hospitals. However, an on-device model trained only on a small local patient population may fail to generalize.\nUnreliable accuracy limits the real-world applicability of on-device learning for mission-critical uses like disease diagnosis or self-driving vehicles.\nOn-device training is also slower than the cloud due to limited resources. Even if each iteration is faster, the overall training process takes longer.\nFor example, a real-time robotics application may require model updates within milliseconds. On-device training on small embedded hardware may take seconds or minutes per update - too slow for real-time use.\nAccuracy, generalization, and speed challenges pose hurdles to adopting on-device learning for real-world production systems, especially when reliability and low latency are critical.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1287,7 +1287,7 @@ "section": "12.3 On-device Adaptation", "text": "12.3 On-device Adaptation\nIn an ML task, resource consumption mainly comes from three sources:\n\nThe ML model itself;\nThe optimization process during model learning\nStoring and processing the dataset used for learning.\n\nCorrespondingly, there are three approaches to adapting existing ML algorithms onto resource-constrained devices:\n\nReducing the complexity of the ML model\nModifying optimizations to reduce training resource requirements\nCreating new storage-efficient data representations\n\nIn the following section, we will review these on-device learning adaptation methods. The Model Optimizations chapter provides more details on model optimizations.\n\n12.3.1 Reducing Model Complexity\nIn this section, we will briefly discuss ways to reduce model complexity when adapting ML models on-device. For details on reducing model complexity, please refer to the Model Optimization Chapter.\n\nTraditional ML Algorithms\nDue to edge devices’ computing and memory limitations, select traditional ML algorithms are great candidates for on-device learning applications due to their lightweight nature. Some example algorithms with low resource footprints include Naive Bayes Classifiers, Support Vector Machines (SVMs), Linear Regression, Logistic Regression, and select Decision Tree algorithms.\nWith some refinements, these classical ML algorithms can be adapted to specific hardware architectures and perform simple tasks. Their low-performance requirements make it easy to integrate continuous learning even on edge devices.\n\n\nPruning\nPruning is a technique for reducing the size and complexity of an ML model to improve its efficiency and generalization performance. This is beneficial for training models on edge devices, where we want to minimize resource usage while maintaining competitive accuracy.\nThe primary goal of pruning is to remove parts of the model that do not contribute significantly to its predictive power while retaining the most informative aspects. In the context of decision trees, pruning involves removing some branches (subtrees) from the tree, leading to a smaller and simpler tree. In the context of DNN, pruning is used to reduce the number of neurons (units) or connections in the network, as shown in Figure 12.2.\n\n\n\n\n\n\nFigure 12.2: Network pruning.\n\n\n\n\n\nReducing Complexity of Deep Learning Models\nTraditional cloud-based DNN frameworks have too much memory overhead to be used on-device. For example, deep learning systems like PyTorch and TensorFlow require hundreds of megabytes of memory overhead when training models such as MobilenetV2, and the overhead scales as the number of training parameters increases.\nTraditional cloud-based DNN frameworks have too much memory overhead to be used on-device. For example, deep learning systems like PyTorch and TensorFlow require hundreds of megabytes of memory overhead when training models such as MobilenetV2-w0.35, and the overhead scales as the number of training parameters increases.\nCurrent research for lightweight DNNs mostly explores CNN architectures. Several bare-metal frameworks designed for running Neural Networks on MCUs by keeping computational overhead and memory footprint low also exist. Some examples include MNN, TVM, and TensorFlow Lite. However, they can only perform inference during forward passes and lack support for backpropagation. While these models are designed for edge deployment, their reduction in model weights and architectural connections led to reduced resource requirements for continuous learning.\nThe tradeoff between performance and model support is clear when adapting the most popular DNN systems. How do we adapt existing DNN models to resource-constrained settings while maintaining support for backpropagation and continuous learning? The latest research suggests algorithm and system codesign techniques that help reduce the resource consumption of ML training on edge devices. Utilizing techniques such as quantization-aware scaling (QAS), sparse updates, and other cutting-edge techniques, on-device learning is possible on embedded systems with a few hundred kilobytes of RAM without additional memory while maintaining high accuracy.\n\n\n\n12.3.2 Modifying Optimization Processes\nChoosing the right optimization strategy is important for DNN training on a device since this allows for finding a good local minimum. Since training occurs on a device, this strategy must also consider limited memory and power.\n\nQuantization-Aware Scaling\nQuantization is a common method for reducing the memory footprint of DNN training. Although this could introduce new errors, these errors can be mitigated by designing a model to characterize this statistical error. For example, models could use stochastic rounding or introduce the quantization error into the gradient updates.\nA specific algorithmic technique is Quantization-Aware Scaling (QAS), which improves the performance of neural networks on low-precision hardware, such as edge devices, mobile devices, or TinyML systems, by adjusting the scale factors during the quantization process.\nAs we discussed in the Model Optimizations chapter, quantization is the process of mapping a continuous range of values to a discrete set of values. In the context of neural networks, quantization often involves reducing the precision of the weights and activations from 32-bit floating point to lower-precision formats such as 8-bit integers. This reduction in precision can significantly reduce the computational cost and memory footprint of the model, making it suitable for deployment on low-precision hardware. Figure 12.3 is an example of float-to-integer quantization.\n\n\n\n\n\n\nFigure 12.3: Float to integer qunatization. Credit: Nvidia.\n\n\n\nHowever, the quantization process can also introduce quantization errors that can degrade the model’s performance. Quantization-aware scaling is a technique that aims to minimize these errors by adjusting the scale factors used in the quantization process.\nThe QAS process involves two main steps:\n\nQuantization-aware training: In this step, the neural network is trained with quantization in mind, using simulated quantization to mimic the effects of quantization during the forward and backward passes. This allows the model to learn to compensate for the quantization errors and improve its performance on low-precision hardware. Refer to the QAT section in Model Optimizations for details.\nQuantization and scaling: After training, the model is quantized to a low-precision format, and the scale factors are adjusted to minimize the quantization errors. The scale factors are chosen based on the distribution of the weights and activations in the model and are adjusted to ensure that the quantized values are within the range of the low-precision format.\n\nQAS is used to overcome the difficulties of optimizing models on tiny devices without needing hyperparameter tuning; QAS automatically scales tensor gradients with various bit precisions. This stabilizes the training process and matches the accuracy of floating-point precision.\n\n\nSparse Updates\nAlthough QAS enables the optimization of a quantized model, it uses a large amount of memory, which is unrealistic for on-device training. So, spare updates are used to reduce the memory footprint of full backward computation. Instead of pruning weights for inference, sparse update prunes the gradient during backward propagation to update the model sparsely. In other words, sparse update skips computing gradients of less important layers and sub-tensors.\nHowever, determining the optimal sparse update scheme given a constraining memory budget can be challenging due to the large search space. For example, the MCUNet model has 43 convolutional layers and a search space of approximately 1030. One technique to address this issue is contribution analysis. Contribution analysis measures the accuracy improvement from biases (updating the last few biases compared to only updating the classifier) and weights (updating the weight of one extra layer compared to only having a bias update). By trying to maximize these improvements, contribution analysis automatically derives an optimal sparse update scheme for enabling on-device training.\n\n\nLayer-Wise Training\nOther methods besides quantization can help optimize routines. One such method is layer-wise training. A significant memory consumer of DNN training is end-to-end backpropagation, which requires all intermediate feature maps to be stored so the model can calculate gradients. An alternative to this approach that reduces the memory footprint of DNN training is sequential layer-by-layer training (T. Chen et al. 2016). Instead of training end-to-end, training a single layer at a time helps avoid having to store intermediate feature maps.\n\nChen, Tianqi, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. “Training Deep Nets with Sublinear Memory Cost.” ArXiv Preprint abs/1604.06174. https://arxiv.org/abs/1604.06174.\n\n\nTrading Computation for Memory\nThe strategy of trading computation for memory involves releasing some of the memory being used to store intermediate results. Instead, these results can be recomputed as needed. Reducing memory in exchange for more computation is shown to reduce the memory footprint of DNN training to fit into almost any budget while also minimizing computational cost (Gruslys et al. 2016).\n\nGruslys, Audrunas, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. “Memory-Efficient Backpropagation Through Time.” In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, 4125–33. https://proceedings.neurips.cc/paper/2016/hash/a501bebf79d570651ff601788ea9d16d-Abstract.html.\n\n\n\n12.3.3 Developing New Data Representations\nThe dimensionality and volume of the training data can significantly impact on-device adaptation. So, another technique for adapting models onto resource-constrained devices is to represent datasets more efficiently.\n\nData Compression\nThe goal of data compression is to reach high accuracies while limiting the amount of training data. One method to achieve this is prioritizing sample complexity: the amount of training data required for the algorithm to reach a target accuracy (Dhar et al. 2021).\n\nDhar, Sauptik, Junyao Guo, Jiayi (Jason) Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. 2021. “A Survey of on-Device Machine Learning: An Algorithms and Learning Theory Perspective.” ACM Transactions on Internet of Things 2 (3): 1–49. https://doi.org/10.1145/3450494.\n\nDarvish Rouhani, Bita, Azalia Mirhoseini, and Farinaz Koushanfar. 2017. “TinyDL: Just-in-time Deep Learning Solution for Constrained Embedded Systems.” In 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 1–4. IEEE. https://doi.org/10.1109/iscas.2017.8050343.\n\nLi, Xiang, Tao Qin, Jian Yang, and Tie-Yan Liu. 2016. “LightRNN: Memory and Computation-Efficient Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, 4385–93. https://proceedings.neurips.cc/paper/2016/hash/c3e4035af2a1cde9f21e1ae1951ac80b-Abstract.html.\nOther more common methods of data compression focus on reducing the dimensionality and the volume of the training data. For example, an approach could take advantage of matrix sparsity to reduce the memory footprint of storing training data. Training data can be transformed into a lower-dimensional embedding and factorized into a dictionary matrix multiplied by a block-sparse coefficient matrix (Darvish Rouhani, Mirhoseini, and Koushanfar 2017). Another example could involve representing words from a large language training dataset in a more compressed vector format (Li et al. 2016).", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1296,9 +1296,9 @@ "href": "contents/ondevice_learning/ondevice_learning.html#transfer-learning", "title": "12  On-Device Learning", "section": "12.4 Transfer Learning", - "text": "12.4 Transfer Learning\nTransfer learning is an ML technique in which a model developed for a particular task is reused as the starting point for a model on a second task. In the context of on-device AI, transfer learning allows us to leverage pre-trained models that have already learned useful representations from large datasets and finetune them for specific tasks using smaller datasets directly on the device. This can significantly reduce the computational resources and time required for training models from scratch.\nFigure 12.4 includes some intuitive examples of transfer learning from the real world. For instance, if you can ride a bicycle, you know how to balance yourself on two-wheel vehicles. Then, it would be easier for you to learn how to ride a motorcycle than it would be for someone who cannot ride a bicycle.\n\n\n\n\n\n\nFigure 12.4: Transferring knowledge between tasks. Credit: Zhuang et al. (2021).\n\n\nZhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. “A Comprehensive Survey on Transfer Learning.” Proc. IEEE 109 (1): 43–76. https://doi.org/10.1109/jproc.2020.3004555.\n\n\nLet’s take the example of a smart sensor application that uses on-device AI to recognize objects in images captured by the device. Traditionally, this would require sending the image data to a server, where a large neural network model processes the data and sends back the results. With on-device AI, the model is stored and runs directly on-device, eliminating the need to send data to a server.\nIf we want to customize the model for the on-device characteristics, training a neural network model from scratch on the device would be impractical due to the limited computational resources and battery life. This is where transfer learning comes in. Instead of training a model from scratch, we can take a pre-trained model, such as a convolutional neural network (CNN) or a transformer network trained on a large dataset of images, and finetune it for our specific object recognition task. This finetuning can be done directly on the device using a smaller dataset of images relevant to the task. By leveraging the pre-trained model, we can reduce the computational resources and time required for training while still achieving high accuracy for the object recognition task.\nTransfer learning is important in making on-device AI practical by allowing us to leverage pre-trained models and finetune them for specific tasks, thereby reducing the computational resources and time required for training. The combination of on-device AI and transfer learning opens up new possibilities for AI applications that are more privacy-conscious and responsive to user needs.\nTransfer learning has revolutionized the way models are developed and deployed, both in the cloud and at the edge. Transfer learning is being used in the real world. One such example is the use of transfer learning to develop AI models that can detect and diagnose diseases from medical images, such as X-rays, MRI scans, and CT scans. For example, researchers at Stanford University developed a transfer learning model that can detect cancer in skin images with an accuracy of 97% (Esteva et al. 2017). This model was pre-trained on 1.28 million images to classify a broad range of objects and then specialized for cancer detection by training on a dermatologist-curated dataset of skin images.\n\nEsteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. 2017. “Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks.” Nature 542 (7639): 115–18. https://doi.org/10.1038/nature21056.\nImplementation in production scenarios can be broadly categorized into two stages: pre-deployment and post-deployment.\n\n12.4.1 Pre-Deployment Specialization\nIn the pre-deployment stage, transfer learning acts as a catalyst to expedite the development process. Here’s how it typically works: Imagine we are creating a system to recognize different breeds of dogs. Rather than starting from scratch, we can utilize a pre-trained model that has already mastered the broader task of recognizing animals in images.\nThis pre-trained model serves as a solid foundation and contains a wealth of knowledge acquired from extensive data. We then finetune this model using a specialized dataset containing images of various dog breeds. This finetuning process tailors the model to our specific need — precisely identifying dog breeds. Once finetuned and validated to meet performance criteria, this specialized model is then ready for deployment.\nHere’s how it works in practice:\n\nStart with a Pre-Trained Model: Begin by selecting a model that has already been trained on a comprehensive dataset, usually related to a general task. This model serves as the foundation for the task at hand.\nFinetuning: The pre-trained model is then finetuned on a smaller, more specialized dataset specific to the desired task. This step allows the model to adapt and specialize its knowledge to the specific requirements of the application.\nValidation: After finetuning, the model is validated to ensure it meets the performance criteria for the specialized task.\nDeployment: Once validated, the specialized model is then deployed into the production environment.\n\nThis method significantly reduces the time and computational resources required to train a model from scratch (Pan and Yang 2010). By adopting transfer learning, embedded systems can achieve high accuracy on specialized tasks without the need to gather extensive data or expend significant computational resources on training from the ground up.\n\nPan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer Learning.” IEEE Trans. Knowl. Data Eng. 22 (10): 1345–59. https://doi.org/10.1109/tkde.2009.191.\n\n\n12.4.2 Post-Deployment Adaptation\nDeployment to a device need not mark the culmination of an ML model’s educational trajectory. With the advent of transfer learning, we open the doors to the deployment of adaptive ML models in real-world scenarios, catering to users’ personalized needs.\nConsider a real-world application where a parent wishes to identify their child in a collection of images from a school event on their smartphone. In this scenario, the parent is faced with the challenge of locating their child amidst images of many other children. Transfer learning can be employed here to finetune an embedded system’s model to this unique and specialized task. Initially, the system might use a generic model trained to recognize faces in images. However, with transfer learning, the system can adapt this model to recognize the specific features of the user’s child.\nHere’s how it works:\n\nData Collection: The embedded system gathers images that include the child, ideally with the parent’s input to ensure accuracy and relevance. This can be done directly on the device, maintaining the user’s data privacy.\nModel Finetuning: The pre-existing face recognition model, which has been trained on a large and diverse dataset, is then finetuned using the newly collected images of the child. This process adapts the model to recognize the child’s specific facial features, distinguishing them from other children in the images.\nValidation: The refined model is then validated to ensure it accurately recognizes the child in various images. This can involve the parent verifying the model’s performance and providing feedback for further improvements.\nDeployment: Once validated, the adapted model is deployed on the device, enabling the parent to easily identify their child in images without having to sift through them manually.\n\nThis on-the-fly customization enhances the model’s efficacy for the individual user, ensuring that they benefit from ML personalization. This is, in part, how iPhotos or Google Photos works when they ask us to recognize a face, and then, based on that information, they index all the photos by that face. Because the learning and adaptation occur on the device itself, there are no risks to personal privacy. The parent’s images are not uploaded to a cloud server or shared with third parties, protecting the family’s privacy while still reaping the benefits of a personalized ML model. This approach represents a significant step forward in the quest to provide users with tailored ML solutions that respect and uphold their privacy.\n\n\n12.4.3 Benefits\nTransfer learning has become an important technique in ML and artificial intelligence, and it is particularly valuable for several reasons.\n\nData Scarcity: In many real-world scenarios, acquiring a sufficiently large labeled dataset to train an ML model from scratch is challenging. Transfer learning mitigates this issue by allowing the use of pre-trained models that have already learned valuable features from a vast dataset.\nComputational Expense: Training a model from scratch requires significant computational resources and time, especially for complex models like deep neural networks. By using transfer learning, we can leverage the computation that has already been done during the training of the source model, thereby saving both time and computational power.\nLimited Annotated Data: For some specific tasks, there might be ample raw data available, but the process of labeling that data for supervised learning can be costly and time-consuming. Transfer learning enables us to utilize pre-trained models that have been trained on a related task with labeled data, hence requiring less annotated data for the new task.\n\nThere are advantages to reusing the features:\n\nHierarchical Feature Learning: Deep learning models, particularly Convolutional Neural Networks (CNNs), can learn hierarchical features. Lower layers typically learn generic features like edges and shapes, while higher layers learn more complex and task-specific features. Transfer learning allows us to reuse the generic features learned by a model and finetune the higher layers for our specific task.\nBoosting Performance: Transfer learning has been proven to boost the performance of models on tasks with limited data. The knowledge gained from the source task can provide a valuable starting point and lead to faster convergence and improved accuracy on the target task.\n\n\n\n\n\n\n\nExercise 12.1: Transfer Learning\n\n\n\n\n\nImagine training an AI to recognize flowers like a pro, but without needing a million flower pictures! That’s the power of transfer learning. In this Colab, we’ll take an AI that already knows about images and teach it to become a flower expert with less effort. Get ready to make your AI smarter, not harder!\n\n\n\n\n\n\n12.4.4 Core Concepts\nUnderstanding the core concepts of transfer learning is essential for effectively utilizing this powerful approach in ML. Here, we’ll break down some of the main principles and components that underlie the process of transfer learning.\n\nSource and Target Tasks\nIn transfer learning, there are two main tasks involved: the source task and the target task. The source task is the task for which the model has already been trained and has learned valuable information. The target task is the new task we want the model to perform. The goal of transfer learning is to leverage the knowledge gained from the source task to improve performance on the target task.\nSuppose we have a model trained to recognize various fruits in images (source task), and we want to create a new model to recognize different vegetables in images (target task). In that case, we can use transfer learning to leverage the knowledge gained during the fruit recognition task to improve the performance of the vegetable recognition model.\n\n\nRepresentation Transfer\nRepresentation transfer is about transferring the learned representations (features) from the source task to the target task. There are three main types of representation transfer:\n\nInstance Transfer: This involves reusing the data instances from the source task in the target task.\nFeature-Representation Transfer: This involves transferring the learned feature representations from the source task to the target task.\nParameter Transfer: This involves transferring the model’s learned parameters (weights) from the source task to the target task.\n\nIn natural language processing, a model trained to understand the syntax and grammar of a language (source task) can have its learned representations transferred to a new model designed to perform sentiment analysis (target task).\n\n\nFinetuning\nFinetuning is the process of adjusting the parameters of a pre-trained model to adapt it to the target task. This typically involves updating the weights of the model’s layers, especially the last few layers, to make the model more relevant for the new task. In image classification, a model pre-trained on a general dataset like ImageNet (source task) can be finetuned by adjusting the weights of its layers to perform well on a specific classification task, like recognizing specific animal species (target task).\n\n\nFeature Extractions\nFeature extraction involves using a pre-trained model as a fixed feature extractor, where the output of the model’s intermediate layers is used as features for the target task. This approach is particularly useful when the target task has a small dataset, as the pre-trained model’s learned features can significantly enhance performance. In medical image analysis, a model pre-trained on a large dataset of general medical images (source task) can be used as a feature extractor to provide valuable features for a new model designed to recognize specific types of tumors in X-ray images (target task).\n\n\n\n12.4.5 Types of Transfer Learning\nTransfer learning can be classified into three main types based on the nature of the source and target tasks and data. Let’s explore each type in detail:\n\nInductive Transfer Learning\nIn inductive transfer learning, the goal is to learn the target predictive function with the help of source data. It typically involves finetuning a pre-trained model on the target task with available labeled data. A common example of inductive transfer learning is image classification tasks. For instance, a model pre-trained on the ImageNet dataset (source task) can be finetuned to classify specific types of birds (target task) using a smaller labeled dataset of bird images.\n\n\nTransductive Transfer Learning\nTransductive transfer learning involves using source and target data, but only the source task. The main aim is to transfer knowledge from the source domain to the target domain, even though the tasks remain the same. Sentiment analysis for different languages can serve as an example of transductive transfer learning. A model trained to perform sentiment analysis in English (source task) can be adapted to perform sentiment analysis in another language, like French (target task), by leveraging parallel datasets of English and French sentences with the same sentiments.\n\n\nUnsupervised Transfer Learning\nUnsupervised transfer learning is used when the source and target tasks are related, but there is no labeled data available for the target task. The goal is to leverage the knowledge gained from the source task to improve performance on the target task, even without labeled data. An example of unsupervised transfer learning is topic modeling in text data. A model trained to extract topics from news articles (source task) can be adapted to extract topics from social media posts (target task) without needing labeled data for the social media posts.\n\n\nComparison and Tradeoffs\nBy leveraging these different types of transfer learning, practitioners can choose the approach that best fits the nature of their tasks and available data, ultimately leading to more effective and efficient ML models. So, in summary:\n\nInductive: different source and target tasks, different domains\nTransductive: different source and target tasks, same domain\nUnsupervised: unlabeled source data, transfers feature representations\n\nTable 12.1 presents a matrix that outlines in a bit more detail the similarities and differences between the types of transfer learning:\n\n\n\nTable 12.1: Comparison of transfer learning types.\n\n\n\n\n\n\n\n\n\n\n\n\nInductive Transfer Learning\nTransductive Transfer Learning\nUnsupervised Transfer Learning\n\n\n\n\nLabeled Data for Target Task\nRequired\nNot Required\nNot Required\n\n\nSource Task\nCan be different\nSame\nSame or Different\n\n\nTarget Task\nCan be different\nSame\nCan be different\n\n\nObjective\nImprove target task performance with source data\nTransfer knowledge from source to target domain\nLeverage source task to improve target task performance without labeled data\n\n\nExample\nImageNet to bird classification\nSentiment analysis in different languages\nTopic modeling for different text data\n\n\n\n\n\n\n\n\n\n12.4.6 Constraints and Considerations\nWhen engaging in transfer learning, there are several factors that must be considered to ensure successful knowledge transfer and model performance. Here’s a breakdown of some key factors:\n\nDomain Similarity\nDomain similarity refers to how closely related the source and target domains are. The more similar the domains, the more likely the transfer learning will be successful. Transferring knowledge from a model trained on images of outdoor scenes (source domain) to a new task that involves recognizing objects in indoor scenes (target domain) might be more successful than transferring knowledge from outdoor scenes to a task involving text analysis, as the domains (images vs. text) are quite different.\n\n\nTask Similarity\nTask similarity refers to how closely related the source and target tasks are. Similar tasks are likely to benefit more from transfer learning. A model trained to recognize different breeds of dogs (source task) can be more easily adapted to recognize different breeds of cats (target task) than it can be adapted to perform a completely different task like language translation.\n\n\nData Quality and Quantity\nThe quality and quantity of data available for the target task can significantly impact the success of transfer learning. More high-quality data can result in better model performance. Suppose we have a large dataset with clear, well-labeled images to recognize specific bird species. In that case, the transfer learning process will likely be more successful than if we have a small, noisy dataset.\n\n\nFeature Space Overlap\nFeature space overlap refers to how well the features learned by the source model align with the features needed for the target task. Greater overlap can lead to more successful transfer learning. A model trained on high-resolution images (source task) may not transfer well to a target task that involves low-resolution images, as the feature space (high-res vs. low-res) is different.\n\n\nModel Complexity\nThe complexity of the source model can also impact the success of transfer learning. Sometimes, a simpler model might transfer better than a complex one, as it is less likely to overfit the source task. For example, a simple convolutional neural network (CNN) model trained on image data (source task) may transfer more successfully to a new image classification task (target task) than a complex CNN with many layers, as the simpler model is less likely to overfit the source task.\nBy considering these factors, ML practitioners can make informed decisions about when and how to utilize transfer learning, ultimately leading to more successful model performance on the target task. The success of transfer learning hinges on the degree of similarity between the source and target domains. Overfitting is risky, especially when finetuning occurs on a limited dataset. On the computational front, certain pre-trained models, owing to their size, might not comfortably fit into the memory constraints of some devices or may run prohibitively slowly. Over time, as data evolves, there is potential for model drift, indicating the need for periodic re-training or ongoing adaptation.\nLearn more about transfer learning in the video below.", + "text": "12.4 Transfer Learning\nTransfer learning is an ML technique in which a model developed for a particular task is reused as the starting point for a model on a second task. In the context of on-device AI, transfer learning allows us to leverage pre-trained models that have already learned useful representations from large datasets and finetune them for specific tasks using smaller datasets directly on the device. This can significantly reduce the computational resources and time required for training models from scratch.\nFigure 12.4 includes some intuitive examples of transfer learning from the real world. For instance, if you can ride a bicycle, you know how to balance yourself on two-wheel vehicles. Then, it would be easier for you to learn how to ride a motorcycle than it would be for someone who cannot ride a bicycle.\n\n\n\n\n\n\nFigure 12.4: Transferring knowledge between tasks. Credit: Zhuang et al. (2021).\n\n\nZhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. “A Comprehensive Survey on Transfer Learning.” Proc. IEEE 109 (1): 43–76. https://doi.org/10.1109/jproc.2020.3004555.\n\n\nLet’s take the example of a smart sensor application that uses on-device AI to recognize objects in images captured by the device. Traditionally, this would require sending the image data to a server, where a large neural network model processes the data and sends back the results. With on-device AI, the model is stored and runs directly on-device, eliminating the need to send data to a server.\nIf we want to customize the model for the on-device characteristics, training a neural network model from scratch on the device would be impractical due to the limited computational resources and battery life. This is where transfer learning comes in. Instead of training a model from scratch, we can take a pre-trained model, such as a convolutional neural network (CNN) or a transformer network trained on a large dataset of images, and finetune it for our specific object recognition task. This finetuning can be done directly on the device using a smaller dataset of images relevant to the task. By leveraging the pre-trained model, we can reduce the computational resources and time required for training while still achieving high accuracy for the object recognition task.\nTransfer learning is important in making on-device AI practical by allowing us to leverage pre-trained models and finetune them for specific tasks, thereby reducing the computational resources and time required for training. The combination of on-device AI and transfer learning opens up new possibilities for AI applications that are more privacy-conscious and responsive to user needs.\nTransfer learning has revolutionized the way models are developed and deployed, both in the cloud and at the edge. Transfer learning is being used in the real world. One such example is the use of transfer learning to develop AI models that can detect and diagnose diseases from medical images, such as X-rays, MRI scans, and CT scans. For example, researchers at Stanford University developed a transfer learning model that can detect cancer in skin images with an accuracy of 97% (Esteva et al. 2017). This model was pre-trained on 1.28 million images to classify a broad range of objects and then specialized for cancer detection by training on a dermatologist-curated dataset of skin images.\n\nEsteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. 2017. “Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks.” Nature 542 (7639): 115–18. https://doi.org/10.1038/nature21056.\nImplementation in production scenarios can be broadly categorized into two stages: pre-deployment and post-deployment.\n\n12.4.1 Pre-Deployment Specialization\nIn the pre-deployment stage, transfer learning acts as a catalyst to expedite the development process. Here’s how it typically works: Imagine we are creating a system to recognize different breeds of dogs. Rather than starting from scratch, we can utilize a pre-trained model that has already mastered the broader task of recognizing animals in images.\nThis pre-trained model serves as a solid foundation and contains a wealth of knowledge acquired from extensive data. We then finetune this model using a specialized dataset containing images of various dog breeds. This finetuning process tailors the model to our specific need — precisely identifying dog breeds. Once finetuned and validated to meet performance criteria, this specialized model is then ready for deployment.\nHere’s how it works in practice:\n\nStart with a Pre-Trained Model: Begin by selecting a model that has already been trained on a comprehensive dataset, usually related to a general task. This model serves as the foundation for the task at hand.\nFinetuning: The pre-trained model is then finetuned on a smaller, more specialized dataset specific to the desired task. This step allows the model to adapt and specialize its knowledge to the specific requirements of the application.\nValidation: After finetuning, the model is validated to ensure it meets the performance criteria for the specialized task.\nDeployment: Once validated, the specialized model is then deployed into the production environment.\n\nThis method significantly reduces the time and computational resources required to train a model from scratch (Pan and Yang 2010). By adopting transfer learning, embedded systems can achieve high accuracy on specialized tasks without the need to gather extensive data or expend significant computational resources on training from the ground up.\n\nPan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer Learning.” IEEE Trans. Knowl. Data Eng. 22 (10): 1345–59. https://doi.org/10.1109/tkde.2009.191.\n\n\n12.4.2 Post-Deployment Adaptation\nDeployment to a device need not mark the culmination of an ML model’s educational trajectory. With the advent of transfer learning, we open the doors to the deployment of adaptive ML models in real-world scenarios, catering to users’ personalized needs.\nConsider a real-world application where a parent wishes to identify their child in a collection of images from a school event on their smartphone. In this scenario, the parent is faced with the challenge of locating their child amidst images of many other children. Transfer learning can be employed here to finetune an embedded system’s model to this unique and specialized task. Initially, the system might use a generic model trained to recognize faces in images. However, with transfer learning, the system can adapt this model to recognize the specific features of the user’s child.\nHere’s how it works:\n\nData Collection: The embedded system gathers images that include the child, ideally with the parent’s input to ensure accuracy and relevance. This can be done directly on the device, maintaining the user’s data privacy.\nModel Finetuning: The pre-existing face recognition model, which has been trained on a large and diverse dataset, is then finetuned using the newly collected images of the child. This process adapts the model to recognize the child’s specific facial features, distinguishing them from other children in the images.\nValidation: The refined model is then validated to ensure it accurately recognizes the child in various images. This can involve the parent verifying the model’s performance and providing feedback for further improvements.\nDeployment: Once validated, the adapted model is deployed on the device, enabling the parent to easily identify their child in images without having to sift through them manually.\n\nThis on-the-fly customization enhances the model’s efficacy for the individual user, ensuring that they benefit from ML personalization. This is, in part, how iPhotos or Google Photos works when they ask us to recognize a face, and then, based on that information, they index all the photos by that face. Because the learning and adaptation occur on the device itself, there are no risks to personal privacy. The parent’s images are not uploaded to a cloud server or shared with third parties, protecting the family’s privacy while still reaping the benefits of a personalized ML model. This approach represents a significant step forward in the quest to provide users with tailored ML solutions that respect and uphold their privacy.\n\n\n12.4.3 Benefits\nTransfer learning has become an important technique in ML and artificial intelligence, and it is particularly valuable for several reasons.\n\nData Scarcity: In many real-world scenarios, acquiring a sufficiently large labeled dataset to train an ML model from scratch is challenging. Transfer learning mitigates this issue by allowing the use of pre-trained models that have already learned valuable features from a vast dataset.\nComputational Expense: Training a model from scratch requires significant computational resources and time, especially for complex models like deep neural networks. By using transfer learning, we can leverage the computation that has already been done during the training of the source model, thereby saving both time and computational power.\nLimited Annotated Data: For some specific tasks, there might be ample raw data available, but the process of labeling that data for supervised learning can be costly and time-consuming. Transfer learning enables us to utilize pre-trained models that have been trained on a related task with labeled data, hence requiring less annotated data for the new task.\n\nThere are advantages to reusing the features:\n\nHierarchical Feature Learning: Deep learning models, particularly Convolutional Neural Networks (CNNs), can learn hierarchical features. Lower layers typically learn generic features like edges and shapes, while higher layers learn more complex and task-specific features. Transfer learning allows us to reuse the generic features learned by a model and finetune the higher layers for our specific task.\nBoosting Performance: Transfer learning has been proven to boost the performance of models on tasks with limited data. The knowledge gained from the source task can provide a valuable starting point and lead to faster convergence and improved accuracy on the target task.\n\n\n\n\n\n\n\nExercise 12.1: Transfer Learning\n\n\n\n\n\nImagine training an AI to recognize flowers like a pro, but without needing a million flower pictures! That’s the power of transfer learning. In this Colab, we’ll take an AI that already knows about images and teach it to become a flower expert with less effort. Get ready to make your AI smarter, not harder!\n\n\n\n\n\n\n12.4.4 Core Concepts\nUnderstanding the core concepts of transfer learning is essential for effectively utilizing this powerful approach in ML. Here, we’ll break down some of the main principles and components that underlie the process of transfer learning.\n\nSource and Target Tasks\nIn transfer learning, there are two main tasks involved: the source task and the target task. The source task is the task for which the model has already been trained and has learned valuable information. The target task is the new task we want the model to perform. The goal of transfer learning is to leverage the knowledge gained from the source task to improve performance on the target task.\nSuppose we have a model trained to recognize various fruits in images (source task), and we want to create a new model to recognize different vegetables in images (target task). In that case, we can use transfer learning to leverage the knowledge gained during the fruit recognition task to improve the performance of the vegetable recognition model.\n\n\nRepresentation Transfer\nRepresentation transfer is about transferring the learned representations (features) from the source task to the target task. There are three main types of representation transfer:\n\nInstance Transfer: This involves reusing the data instances from the source task in the target task.\nFeature-Representation Transfer: This involves transferring the learned feature representations from the source task to the target task.\nParameter Transfer: This involves transferring the model’s learned parameters (weights) from the source task to the target task.\n\nIn natural language processing, a model trained to understand the syntax and grammar of a language (source task) can have its learned representations transferred to a new model designed to perform sentiment analysis (target task).\n\n\nFinetuning\nFinetuning is the process of adjusting the parameters of a pre-trained model to adapt it to the target task. This typically involves updating the weights of the model’s layers, especially the last few layers, to make the model more relevant for the new task. In image classification, a model pre-trained on a general dataset like ImageNet (source task) can be finetuned by adjusting the weights of its layers to perform well on a specific classification task, like recognizing specific animal species (target task).\n\n\nFeature Extractions\nFeature extraction involves using a pre-trained model as a fixed feature extractor, where the output of the model’s intermediate layers is used as features for the target task. This approach is particularly useful when the target task has a small dataset, as the pre-trained model’s learned features can significantly enhance performance. In medical image analysis, a model pre-trained on a large dataset of general medical images (source task) can be used as a feature extractor to provide valuable features for a new model designed to recognize specific types of tumors in X-ray images (target task).\n\n\n\n12.4.5 Types of Transfer Learning\nTransfer learning can be classified into three main types based on the nature of the source and target tasks and data. Let’s explore each type in detail:\n\nInductive Transfer Learning\nIn inductive transfer learning, the goal is to learn the target predictive function with the help of source data. It typically involves finetuning a pre-trained model on the target task with available labeled data. A common example of inductive transfer learning is image classification tasks. For instance, a model pre-trained on the ImageNet dataset (source task) can be finetuned to classify specific types of birds (target task) using a smaller labeled dataset of bird images.\n\n\nTransductive Transfer Learning\nTransductive transfer learning involves using source and target data, but only the source task. The main aim is to transfer knowledge from the source domain to the target domain, even though the tasks remain the same. Sentiment analysis for different languages can serve as an example of transductive transfer learning. A model trained to perform sentiment analysis in English (source task) can be adapted to perform sentiment analysis in another language, like French (target task), by leveraging parallel datasets of English and French sentences with the same sentiments.\n\n\nUnsupervised Transfer Learning\nUnsupervised transfer learning is used when the source and target tasks are related, but there is no labeled data available for the target task. The goal is to leverage the knowledge gained from the source task to improve performance on the target task, even without labeled data. An example of unsupervised transfer learning is topic modeling in text data. A model trained to extract topics from news articles (source task) can be adapted to extract topics from social media posts (target task) without needing labeled data for the social media posts.\n\n\nComparison and Tradeoffs\nBy leveraging these different types of transfer learning, practitioners can choose the approach that best fits the nature of their tasks and available data, ultimately leading to more effective and efficient ML models. So, in summary:\n\nInductive: different source and target tasks, different domains\nTransductive: different source and target tasks, same domain\nUnsupervised: unlabeled source data, transfers feature representations\n\nTable 12.1 presents a matrix that outlines in a bit more detail the similarities and differences between the types of transfer learning:\n\n\n\nTable 12.1: Comparison of transfer learning types.\n\n\n\n\n\n\n\n\n\n\n\n\nInductive Transfer Learning\nTransductive Transfer Learning\nUnsupervised Transfer Learning\n\n\n\n\nLabeled Data for Target Task\nRequired\nNot Required\nNot Required\n\n\nSource Task\nCan be different\nSame\nSame or Different\n\n\nTarget Task\nCan be different\nSame\nCan be different\n\n\nObjective\nImprove target task performance with source data\nTransfer knowledge from source to target domain\nLeverage source task to improve target task performance without labeled data\n\n\nExample\nImageNet to bird classification\nSentiment analysis in different languages\nTopic modeling for different text data\n\n\n\n\n\n\n\n\n\n12.4.6 Constraints and Considerations\nWhen engaging in transfer learning, there are several factors that must be considered to ensure successful knowledge transfer and model performance. Here’s a breakdown of some key factors:\n\nDomain Similarity\nDomain similarity refers to how closely related the source and target domains are. The more similar the domains, the more likely the transfer learning will be successful. Transferring knowledge from a model trained on images of outdoor scenes (source domain) to a new task that involves recognizing objects in indoor scenes (target domain) might be more successful than transferring knowledge from outdoor scenes to a task involving text analysis, as the domains (images vs. text) are quite different.\n\n\nTask Similarity\nTask similarity refers to how closely related the source and target tasks are. Similar tasks are likely to benefit more from transfer learning. A model trained to recognize different breeds of dogs (source task) can be more easily adapted to recognize different breeds of cats (target task) than it can be adapted to perform a completely different task like language translation.\n\n\nData Quality and Quantity\nThe quality and quantity of data available for the target task can significantly impact the success of transfer learning. More high-quality data can result in better model performance. Suppose we have a large dataset with clear, well-labeled images to recognize specific bird species. In that case, the transfer learning process will likely be more successful than if we have a small, noisy dataset.\n\n\nFeature Space Overlap\nFeature space overlap refers to how well the features learned by the source model align with the features needed for the target task. Greater overlap can lead to more successful transfer learning. A model trained on high-resolution images (source task) may not transfer well to a target task that involves low-resolution images, as the feature space (high-res vs. low-res) is different.\n\n\nModel Complexity\nThe complexity of the source model can also impact the success of transfer learning. Sometimes, a simpler model might transfer better than a complex one, as it is less likely to overfit the source task. For example, a simple convolutional neural network (CNN) model trained on image data (source task) may transfer more successfully to a new image classification task (target task) than a complex CNN with many layers, as the simpler model is less likely to overfit the source task.\nBy considering these factors, ML practitioners can make informed decisions about when and how to utilize transfer learning, ultimately leading to more successful model performance on the target task. The success of transfer learning hinges on the degree of similarity between the source and target domains. Overfitting is risky, especially when finetuning occurs on a limited dataset. On the computational front, certain pre-trained models, owing to their size, might not comfortably fit into the memory constraints of some devices or may run prohibitively slowly. Over time, as data evolves, there is potential for model drift, indicating the need for periodic re-training or ongoing adaptation.\nLearn more about transfer learning in Video 12.1 below.\n\n\n\n\n\n\nVideo 12.1: Transfer Learning", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1307,9 +1307,9 @@ "href": "contents/ondevice_learning/ondevice_learning.html#sec-fl", "title": "12  On-Device Learning", "section": "12.5 Federated Machine Learning", - "text": "12.5 Federated Machine Learning\nFederated Learning Overview\nThe modern internet is full of large networks of connected devices. Whether it’s cell phones, thermostats, smart speakers, or other IOT products, countless edge devices are a goldmine for hyper-personalized, rich data. However, with that rich data comes an assortment of problems with information transfer and privacy. Constructing a training dataset in the cloud from these devices would involve high volumes of bandwidth, cost-efficient data transfer, and violation of users’ privacy.\nFederated learning offers a solution to these problems: train models partially on the edge devices and only communicate model updates to the cloud. In 2016, a team from Google designed architecture for federated learning that attempts to address these problems.\nIn their initial paper, Google outlines a principle federated learning algorithm called FederatedAveraging, which is shown in Figure 12.5. Specifically, FederatedAveraging performs stochastic gradient descent (SGD) over several different edge devices. In this process, each device calculates a gradient \\(g_k = \\nabla F_k(w_t)\\) which is then applied to update the server-side weights as (with \\(\\eta\\) as learning rate across \\(k\\) clients): \\[\nw_{t+1} \\rightarrow w_t - \\eta \\sum_{k=1}^{K} \\frac{n_k}{n}g_k\n\\] This summarizes the basic algorithm for federated learning on the right. For each round of training, the server takes a random set of client devices and calls each client to train on its local batch using the most recent server-side weights. Those weights are then returned to the server, where they are collected individually and averaged to update the global model weights.\n\n\n\n\n\n\nFigure 12.5: Google’s Proposed FederatedAverage Algorithm. Credit: McMahan et al. (2017).\n\n\n\nWith this proposed structure, there are a few key vectors for further optimizing federated learning. We will outline each in the following subsections.\nThe following video is an overview of federated learning.\n\n\n12.5.1 Communication Efficiency\nOne of the key bottlenecks in federated learning is communication. Every time a client trains the model, they must communicate their updates back to the server. Similarly, once the server has averaged all the updates, it must send them back to the client. This incurs huge bandwidth and resource costs on large networks of millions of devices. As the field of federated learning advances, a few optimizations have been developed to minimize this communication. To address the footprint of the model, researchers have developed model compression techniques. In the client-server protocol, federated learning can also minimize communication through the selective sharing of updates on clients. Finally, efficient aggregation techniques can also streamline the communication process.\n\n\n12.5.2 Model Compression\nIn standard federated learning, the server communicates the entire model to each client, and then the client sends back all of the updated weights. This means that the easiest way to reduce the client’s memory and communication footprint is to minimize the size of the model needed to be communicated. We can employ all of the previously discussed model optimization strategies to do this.\nIn 2022, another team at Google proposed that each client communicates via a compressed format and decompresses the model on the fly for training (Yang et al. 2023), allocating and deallocating the full memory for the model only for a short period while training. The model is compressed through a range of various quantization strategies elaborated upon in their paper. Meanwhile, the server can update the uncompressed model by decompressing and applying updates as they come in.\n\nYang, Tien-Ju, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv Mathews, and Mingqing Chen. 2023. “Online Model Compression for Federated Learning with Large Models.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE; IEEE. https://doi.org/10.1109/icassp49357.2023.10097124.\n\n\n12.5.3 Selective Update Sharing\nThere are many methods for selectively sharing updates. The general principle is that reducing the portion of the model that the clients are training on the edge reduces the memory necessary for training and the size of communication to the server. In basic federated learning, the client trains the entire model. This means that when a client sends an update to the server, it has gradients for every weight in the network.\nHowever, we cannot just reduce communication by sending pieces of those gradients from each client to the server because the gradients are part of an entire update required to improve the model. Instead, you need to architecturally design the model such that each client trains only a small portion of the broader model, reducing the total communication while still gaining the benefit of training on client data. A paper (Shi and Radu 2022) from the University of Sheffield applies this concept to a CNN by splitting the global model into two parts: an upper and a lower part, as shown in Z. Chen and Xu (2023).\n\nShi, Hongrui, and Valentin Radu. 2022. “Data Selection for Efficient Model Update in Federated Learning.” In Proceedings of the 2nd European Workshop on Machine Learning and Systems, 72–78. ACM. https://doi.org/10.1145/3517207.3526980.\n\n\n\n\n\n\nFigure 12.6: Split model architecture for selective sharing. Credit: Shi et al., (2022).\n\n\n\nThe lower part is designed to focus on generic features in the dataset, while the upper part, trained on those generic features, is designed to be more sensitive to the activation maps. This means that the lower part of the model is trained through standard federated averaging across all of the clients. Meanwhile, the upper part of the model is trained entirely on the server side from the activation maps generated by the clients. This approach drastically reduces communication for the model while still making the network robust to various types of input found in the data on the client devices.\n\n\n12.5.4 Optimized Aggregation\nIn addition to reducing the communication overhead, optimizing the aggregation function can improve model training speed and accuracy in certain federated learning use cases. While the standard for aggregation is just averaging, various other approaches can improve model efficiency, accuracy, and security. One alternative is clipped averaging, which clips the model updates within a specific range. Another strategy to preserve security is differential privacy average aggregation. This approach integrates differential privacy into the aggregations tep to protect client identities. Each client adds a layer of random noise to their updates before communicating to the server. The server then updates the server with the noisy updates, meaning that the amount of noise needs to be tuned carefully to balance privacy and accuracy.\nIn addition to security-enhancing aggregation methods, there are several modifications to the aggregation methods that can improve training speed and performance by adding client metadata along with the weight updates. Momentum aggregation is a technique that helps address the convergence problem. In federated learning, client data can be extremely heterogeneous depending on the different environments in which the devices are used. That means that many models with heterogeneous data may need help to converge. Each client stores a momentum term locally, which tracks the pace of change over several updates. With clients communicating this momentum, the server can factor in the rate of change of each update when changing the global model to accelerate convergence. Similarly, weighted aggregation can factor in the client performance or other parameters like device type or network connection strength to adjust the weight with which the server should incorporate the model updates. Further description of specific aggregation algorithms is described by Moshawrab et al. (2023).\n\nMoshawrab, Mohammad, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim, and Ali Raad. 2023. “Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives.” Electronics 12 (10): 2287. https://doi.org/10.3390/electronics12102287.\n\n\n12.5.5 Handling non-IID Data\nWhen using federated learning to train a model across many client devices, it is convenient to consider the data to be independent and identically distributed (IID) across all clients. When data is IID, the model will converge faster and perform better because each local update on any given client is more representative of the broader dataset. This makes aggregation straightforward, as you can directly average all clients. However, this differs from how data often appears in the real world. Consider a few of the following ways in which data may be non-IID:\n\nIf you are learning on a set of health-monitor devices, different device models could mean different sensor qualities and properties. This means that low-quality sensors and devices may produce data, and therefore, model updates distinctly different than high-quality ones\nA smart keyboard trained to perform autocorrect. If you have a disproportionate amount of devices from a certain region, the slang, sentence structure, or even language they were using could skew more model updates towards a certain style of typing\nIf you have wildlife sensors in remote areas, connectivity may not be equally distributed, causing some clients in certain regions to be unable to send more model updates than others. If those regions have different wildlife activity from certain species, that could skew the updates toward those animals\n\nThere are a few approaches to addressing non-IID data in federated learning. One approach would be to change the aggregation algorithm. If you use a weighted aggregation algorithm, you can adjust based on different client properties like region, sensor properties, or connectivity (Zhao et al. 2018).\n\nZhao, Yue, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. “Federated Learning with Non-Iid Data.” ArXiv Preprint abs/1806.00582. https://arxiv.org/abs/1806.00582.\n\n\n12.5.6 Client Selection\nConsidering all of the factors influencing the efficacy of federated learning, like IID data and communication, client selection is a key component to ensuring a system trains well. Selecting the wrong clients can skew the dataset, resulting in non-IID data. Similarly, choosing clients randomly with bad network connections can slow down communication. Therefore, several key characteristics must be considered when selecting the right subset of clients.\nWhen selecting clients, there are three main components to consider: data heterogeneity, resource allocation, and communication cost. We can select clients on the previously proposed metrics in the non-IID section to address data heterogeneity. In federated learning, all devices may have different amounts of computing, resulting in some being more inefficient at training than others. When selecting a subset of clients for training, one must consider a balance of data heterogeneity and available resources. In an ideal scenario, you can always select the subset of clients with the greatest resources. However, this may skew your dataset, so a balance must be struck. Communication differences add another layer; you want to avoid being bottlenecked by waiting for devices with poor connections to transmit all their updates. Therefore, you must also consider choosing a subset of diverse yet well-connected devices.\n\n\n12.5.7 An Example of Deployed Federated Learning: G board\nA primary example of a deployed federated learning system is Google’s Keyboard, Gboard, for Android devices. In implementing federated learning for the keyboard, Google focused on employing differential privacy techniques to protect the user’s data and identity. Gboard leverages language models for several key features, such as Next Word Prediction (NWP), Smart Compose (SC), and On-The-Fly rescoring (OTF) (Xu et al. 2023), as shown in Figure 12.7.\n\nXu, Zheng, Yanxiang Zhang, Galen Andrew, Christopher A Choquette-Choo, Peter Kairouz, H Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang. 2023. “Federated Learning of Gboard Language Models with Differential Privacy.” ArXiv Preprint abs/2305.18465. https://arxiv.org/abs/2305.18465.\nNWP will anticipate the next word the user tries to type based on the previous one. SC gives inline suggestions to speed up the typing based on each character. OTF will re-rank the proposed next words based on the active typing process. All three of these models need to run quickly on the edge, and federated learning can accelerate training on the users’ data. However, uploading every word a user typed to the cloud for training would be a massive privacy violation. Therefore, federated learning emphasizes differential privacy, which protects the user while enabling a better user experience.\n\n\n\n\n\n\nFigure 12.7: Google G Board Features. Credit: Zheng et al., (2023).\n\n\n\nTo accomplish this goal, Google employed its algorithm DP-FTRL, which provides a formal guarantee that trained models will not memorize specific user data or identities. The algorithm system design is shown in Figure 12.8. DP-FTRL, combined with secure aggregation, encrypts model updates and provides an optimal balance of privacy and utility. Furthermore, adaptive clipping is applied in the aggregation process to limit the impact of individual users on the global model (step 3 in Figure 12.8). By combining all these techniques, Google can continuously refine its keyboard while preserving user privacy in a formally provable way.\n\n\n\n\n\n\nFigure 12.8: Differential Privacy in G Board. Credit: Zheng et al., (2023).\n\n\n\n\n\n\n\n\n\nExercise 12.2: Federated Learning - Text Generation\n\n\n\n\n\nHave you ever used those smart keyboards to suggest the next word? With federated learning, we can make them even better without sacrificing privacy. In this Colab, we’ll teach an AI to predict words by training on text data spread across devices. Get ready to make your typing even smoother!\n\n\n\n\n\n\n\n\n\n\nExercise 12.3: Federated Learning - Image Classification\n\n\n\n\n\nWant to train an image-savvy AI without sending your photos to the cloud? Federated learning is the answer! In this Colab, we’ll train a model across multiple devices, each learning from its images. Privacy is protected, and teamwork makes the AI dream work!\n\n\n\n\n\n\n12.5.8 Benchmarking for Federated Learning: MedPerf\nOne of the richest examples of data on the edge is medical devices. These devices store some of the most personal data on users but offer huge advances in personalized treatment and better accuracy in medical AI. Given these two factors, medical devices are the perfect use case for federated learning. MedPerf is an open-source platform used to benchmark models using federated evaluation (Karargyris et al. 2023). Instead of just training models via federated learning, MedPerf takes the model to edge devices to test it against personalized data while preserving privacy. In this way, a benchmark committee can evaluate various models in the real world on edge devices while still preserving patient anonymity.\n\nKarargyris, Alexandros, Renato Umeton, Micah J Sheller, Alejandro Aristizabal, Johnu George, Anna Wuest, Sarthak Pati, et al. 2023. “Federated Benchmarking of Medical Artificial Intelligence with MedPerf.” Nature Machine Intelligence 5 (7): 799–810. https://doi.org/10.1038/s42256-023-00652-2.", + "text": "12.5 Federated Machine Learning\nFederated Learning Overview\nThe modern internet is full of large networks of connected devices. Whether it’s cell phones, thermostats, smart speakers, or other IOT products, countless edge devices are a goldmine for hyper-personalized, rich data. However, with that rich data comes an assortment of problems with information transfer and privacy. Constructing a training dataset in the cloud from these devices would involve high volumes of bandwidth, cost-efficient data transfer, and violation of users’ privacy.\nFederated learning offers a solution to these problems: train models partially on the edge devices and only communicate model updates to the cloud. In 2016, a team from Google designed architecture for federated learning that attempts to address these problems.\nIn their initial paper, Google outlines a principle federated learning algorithm called FederatedAveraging, which is shown in Figure 12.5. Specifically, FederatedAveraging performs stochastic gradient descent (SGD) over several different edge devices. In this process, each device calculates a gradient \\(g_k = \\nabla F_k(w_t)\\) which is then applied to update the server-side weights as (with \\(\\eta\\) as learning rate across \\(k\\) clients): \\[\nw_{t+1} \\rightarrow w_t - \\eta \\sum_{k=1}^{K} \\frac{n_k}{n}g_k\n\\] This summarizes the basic algorithm for federated learning on the right. For each round of training, the server takes a random set of client devices and calls each client to train on its local batch using the most recent server-side weights. Those weights are then returned to the server, where they are collected individually and averaged to update the global model weights.\n\n\n\n\n\n\nFigure 12.5: Google’s Proposed FederatedAverage Algorithm. Credit: McMahan et al. (2017).\n\n\n\nWith this proposed structure, there are a few key vectors for further optimizing federated learning. We will outline each in the following subsections.\nVideo 12.2 gives an overview of federated learning.\n\n\n\n\n\n\nVideo 12.2: Transfer Learning\n\n\n\n\n\n\n\n12.5.1 Communication Efficiency\nOne of the key bottlenecks in federated learning is communication. Every time a client trains the model, they must communicate their updates back to the server. Similarly, once the server has averaged all the updates, it must send them back to the client. This incurs huge bandwidth and resource costs on large networks of millions of devices. As the field of federated learning advances, a few optimizations have been developed to minimize this communication. To address the footprint of the model, researchers have developed model compression techniques. In the client-server protocol, federated learning can also minimize communication through the selective sharing of updates on clients. Finally, efficient aggregation techniques can also streamline the communication process.\n\n\n12.5.2 Model Compression\nIn standard federated learning, the server communicates the entire model to each client, and then the client sends back all of the updated weights. This means that the easiest way to reduce the client’s memory and communication footprint is to minimize the size of the model needed to be communicated. We can employ all of the previously discussed model optimization strategies to do this.\nIn 2022, another team at Google proposed that each client communicates via a compressed format and decompresses the model on the fly for training (Yang et al. 2023), allocating and deallocating the full memory for the model only for a short period while training. The model is compressed through a range of various quantization strategies elaborated upon in their paper. Meanwhile, the server can update the uncompressed model by decompressing and applying updates as they come in.\n\nYang, Tien-Ju, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv Mathews, and Mingqing Chen. 2023. “Online Model Compression for Federated Learning with Large Models.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE; IEEE. https://doi.org/10.1109/icassp49357.2023.10097124.\n\n\n12.5.3 Selective Update Sharing\nThere are many methods for selectively sharing updates. The general principle is that reducing the portion of the model that the clients are training on the edge reduces the memory necessary for training and the size of communication to the server. In basic federated learning, the client trains the entire model. This means that when a client sends an update to the server, it has gradients for every weight in the network.\nHowever, we cannot just reduce communication by sending pieces of those gradients from each client to the server because the gradients are part of an entire update required to improve the model. Instead, you need to architecturally design the model such that each client trains only a small portion of the broader model, reducing the total communication while still gaining the benefit of training on client data. A paper (Shi and Radu 2022) from the University of Sheffield applies this concept to a CNN by splitting the global model into two parts: an upper and a lower part, as shown in Z. Chen and Xu (2023).\n\nShi, Hongrui, and Valentin Radu. 2022. “Data Selection for Efficient Model Update in Federated Learning.” In Proceedings of the 2nd European Workshop on Machine Learning and Systems, 72–78. ACM. https://doi.org/10.1145/3517207.3526980.\n\n\n\n\n\n\nFigure 12.6: Split model architecture for selective sharing. Credit: Shi et al., (2022).\n\n\n\nThe lower part is designed to focus on generic features in the dataset, while the upper part, trained on those generic features, is designed to be more sensitive to the activation maps. This means that the lower part of the model is trained through standard federated averaging across all of the clients. Meanwhile, the upper part of the model is trained entirely on the server side from the activation maps generated by the clients. This approach drastically reduces communication for the model while still making the network robust to various types of input found in the data on the client devices.\n\n\n12.5.4 Optimized Aggregation\nIn addition to reducing the communication overhead, optimizing the aggregation function can improve model training speed and accuracy in certain federated learning use cases. While the standard for aggregation is just averaging, various other approaches can improve model efficiency, accuracy, and security. One alternative is clipped averaging, which clips the model updates within a specific range. Another strategy to preserve security is differential privacy average aggregation. This approach integrates differential privacy into the aggregations tep to protect client identities. Each client adds a layer of random noise to their updates before communicating to the server. The server then updates the server with the noisy updates, meaning that the amount of noise needs to be tuned carefully to balance privacy and accuracy.\nIn addition to security-enhancing aggregation methods, there are several modifications to the aggregation methods that can improve training speed and performance by adding client metadata along with the weight updates. Momentum aggregation is a technique that helps address the convergence problem. In federated learning, client data can be extremely heterogeneous depending on the different environments in which the devices are used. That means that many models with heterogeneous data may need help to converge. Each client stores a momentum term locally, which tracks the pace of change over several updates. With clients communicating this momentum, the server can factor in the rate of change of each update when changing the global model to accelerate convergence. Similarly, weighted aggregation can factor in the client performance or other parameters like device type or network connection strength to adjust the weight with which the server should incorporate the model updates. Further description of specific aggregation algorithms is described by Moshawrab et al. (2023).\n\nMoshawrab, Mohammad, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim, and Ali Raad. 2023. “Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives.” Electronics 12 (10): 2287. https://doi.org/10.3390/electronics12102287.\n\n\n12.5.5 Handling non-IID Data\nWhen using federated learning to train a model across many client devices, it is convenient to consider the data to be independent and identically distributed (IID) across all clients. When data is IID, the model will converge faster and perform better because each local update on any given client is more representative of the broader dataset. This makes aggregation straightforward, as you can directly average all clients. However, this differs from how data often appears in the real world. Consider a few of the following ways in which data may be non-IID:\n\nIf you are learning on a set of health-monitor devices, different device models could mean different sensor qualities and properties. This means that low-quality sensors and devices may produce data, and therefore, model updates distinctly different than high-quality ones\nA smart keyboard trained to perform autocorrect. If you have a disproportionate amount of devices from a certain region, the slang, sentence structure, or even language they were using could skew more model updates towards a certain style of typing\nIf you have wildlife sensors in remote areas, connectivity may not be equally distributed, causing some clients in certain regions to be unable to send more model updates than others. If those regions have different wildlife activity from certain species, that could skew the updates toward those animals\n\nThere are a few approaches to addressing non-IID data in federated learning. One approach would be to change the aggregation algorithm. If you use a weighted aggregation algorithm, you can adjust based on different client properties like region, sensor properties, or connectivity (Zhao et al. 2018).\n\nZhao, Yue, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. “Federated Learning with Non-Iid Data.” ArXiv Preprint abs/1806.00582. https://arxiv.org/abs/1806.00582.\n\n\n12.5.6 Client Selection\nConsidering all of the factors influencing the efficacy of federated learning, like IID data and communication, client selection is a key component to ensuring a system trains well. Selecting the wrong clients can skew the dataset, resulting in non-IID data. Similarly, choosing clients randomly with bad network connections can slow down communication. Therefore, several key characteristics must be considered when selecting the right subset of clients.\nWhen selecting clients, there are three main components to consider: data heterogeneity, resource allocation, and communication cost. We can select clients on the previously proposed metrics in the non-IID section to address data heterogeneity. In federated learning, all devices may have different amounts of computing, resulting in some being more inefficient at training than others. When selecting a subset of clients for training, one must consider a balance of data heterogeneity and available resources. In an ideal scenario, you can always select the subset of clients with the greatest resources. However, this may skew your dataset, so a balance must be struck. Communication differences add another layer; you want to avoid being bottlenecked by waiting for devices with poor connections to transmit all their updates. Therefore, you must also consider choosing a subset of diverse yet well-connected devices.\n\n\n12.5.7 An Example of Deployed Federated Learning: G board\nA primary example of a deployed federated learning system is Google’s Keyboard, Gboard, for Android devices. In implementing federated learning for the keyboard, Google focused on employing differential privacy techniques to protect the user’s data and identity. Gboard leverages language models for several key features, such as Next Word Prediction (NWP), Smart Compose (SC), and On-The-Fly rescoring (OTF) (Xu et al. 2023), as shown in Figure 12.7.\n\nXu, Zheng, Yanxiang Zhang, Galen Andrew, Christopher A Choquette-Choo, Peter Kairouz, H Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang. 2023. “Federated Learning of Gboard Language Models with Differential Privacy.” ArXiv Preprint abs/2305.18465. https://arxiv.org/abs/2305.18465.\nNWP will anticipate the next word the user tries to type based on the previous one. SC gives inline suggestions to speed up the typing based on each character. OTF will re-rank the proposed next words based on the active typing process. All three of these models need to run quickly on the edge, and federated learning can accelerate training on the users’ data. However, uploading every word a user typed to the cloud for training would be a massive privacy violation. Therefore, federated learning emphasizes differential privacy, which protects the user while enabling a better user experience.\n\n\n\n\n\n\nFigure 12.7: Google G Board Features. Credit: Zheng et al., (2023).\n\n\n\nTo accomplish this goal, Google employed its algorithm DP-FTRL, which provides a formal guarantee that trained models will not memorize specific user data or identities. The algorithm system design is shown in Figure 12.8. DP-FTRL, combined with secure aggregation, encrypts model updates and provides an optimal balance of privacy and utility. Furthermore, adaptive clipping is applied in the aggregation process to limit the impact of individual users on the global model (step 3 in Figure 12.8). By combining all these techniques, Google can continuously refine its keyboard while preserving user privacy in a formally provable way.\n\n\n\n\n\n\nFigure 12.8: Differential Privacy in G Board. Credit: Zheng et al., (2023).\n\n\n\n\n\n\n\n\n\nExercise 12.2: Federated Learning - Text Generation\n\n\n\n\n\nHave you ever used those smart keyboards to suggest the next word? With federated learning, we can make them even better without sacrificing privacy. In this Colab, we’ll teach an AI to predict words by training on text data spread across devices. Get ready to make your typing even smoother!\n\n\n\n\n\n\n\n\n\n\nExercise 12.3: Federated Learning - Image Classification\n\n\n\n\n\nWant to train an image-savvy AI without sending your photos to the cloud? Federated learning is the answer! In this Colab, we’ll train a model across multiple devices, each learning from its images. Privacy is protected, and teamwork makes the AI dream work!\n\n\n\n\n\n\n12.5.8 Benchmarking for Federated Learning: MedPerf\nOne of the richest examples of data on the edge is medical devices. These devices store some of the most personal data on users but offer huge advances in personalized treatment and better accuracy in medical AI. Given these two factors, medical devices are the perfect use case for federated learning. MedPerf is an open-source platform used to benchmark models using federated evaluation (Karargyris et al. 2023). Instead of just training models via federated learning, MedPerf takes the model to edge devices to test it against personalized data while preserving privacy. In this way, a benchmark committee can evaluate various models in the real world on edge devices while still preserving patient anonymity.\n\nKarargyris, Alexandros, Renato Umeton, Micah J Sheller, Alejandro Aristizabal, Johnu George, Anna Wuest, Sarthak Pati, et al. 2023. “Federated Benchmarking of Medical Artificial Intelligence with MedPerf.” Nature Machine Intelligence 5 (7): 799–810. https://doi.org/10.1038/s42256-023-00652-2.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1320,7 +1320,7 @@ "section": "12.6 Security Concerns", "text": "12.6 Security Concerns\nPerforming ML model training and adaptation on end-user devices also introduces security risks that must be addressed. Some key security concerns include:\n\nExposure of private data: Training data may be leaked or stolen from devices\nData poisoning: Adversaries can manipulate training data to degrade model performance\nModel extraction: Attackers may attempt to steal trained model parameters\nMembership inference: Models may reveal the participation of specific users’ data\nEvasion attacks: Specially crafted inputs can cause misclassification\n\nAny system that performs learning on-device introduces security concerns, as it may expose vulnerabilities in larger-scale models. Numerous security risks are associated with any ML model, but these risks have specific consequences for on-device learning. Fortunately, there are methods to mitigate these risks and improve the real-world performance of on-device learning.\n\n12.6.1 Data Poisoning\nOn-device ML introduces unique data security challenges compared to traditional cloud-based training. In particular, data poisoning attacks pose a serious threat during on-device learning. Adversaries can manipulate training data to degrade model performance when deployed.\nSeveral data poisoning attack techniques exist:\n\nLabel Flipping: It involves applying incorrect labels to samples. For instance, in image classification, cat photos may be labeled as dogs to confuse the model. Flipping even 10% of labels can have significant consequences on the model.\nData Insertion: It introduces fake or distorted inputs into the training set. This could include pixelated images, noisy audio, or garbled text.\n** Logic Corruption: ** This alters the underlying [patterns] (https://www.worldscientific.com/doi/10.1142/S0218001414600027) in data to mislead the model. In sentiment analysis, highly negative reviews may be marked positive through this technique. For this reason, recent surveys have shown that many companies are more afraid of data poisoning than other adversarial ML concerns.\n\nWhat makes data poisoning alarming is how it exploits the discrepancy between curated datasets and live training data. Consider a cat photo dataset collected from the internet. In the weeks later, when this data trains a model on-device, new cat photos on the web differ significantly.\nWith data poisoning, attackers purchase domains and upload content that influences a portion of the training data. Even small data changes significantly impact the model’s learned behavior. Consequently, poisoning can instill racist, sexist, or other harmful biases if unchecked.\nMicrosoft Tay was a chatbot launched by Microsoft in 2016. It was designed to learn from its interactions with users on social media platforms like Twitter. Unfortunately, Microsoft Tay became a prime example of data poisoning in ML models. Within 24 hours of its launch, Microsoft had to take Tay offline because it had started producing offensive and inappropriate messages, including hate speech and racist comments. This occurred because some users on social media intentionally fed Tay with harmful and offensive input, which the chatbot then learned from and incorporated into its responses.\nThis incident is a clear example of data poisoning because malicious actors intentionally manipulated the data used to train and inform the chatbot’s responses. The data poisoning resulted in the chatbot adopting harmful biases and producing output that its developers did not intend. It demonstrates how even small amounts of maliciously crafted data can significantly impact the behavior of ML models and highlights the importance of implementing robust data filtering and validation mechanisms to prevent such incidents from occurring.\nSuch biases could have dangerous real-world impacts. Rigorous data validation, anomaly detection, and tracking of data provenance are critical defensive measures. Adopting frameworks like Five Safes ensures models are trained on high-quality, representative data (Desai et al. 2016).\n\nDesai, Tanvi, Felix Ritchie, Richard Welpton, et al. 2016. “Five Safes: Designing Data Access for Research.” Economics Working Paper Series 1601: 28.\nData poisoning is a pressing concern for secure on-device learning since data at the endpoint cannot be easily monitored in real-time. If models are allowed to adapt on their own, then we run the risk of the device acting maliciously. However, continued research in adversarial ML aims to develop robust solutions to detect and mitigate such data attacks.\n\n\n12.6.2 Adversarial Attacks\nDuring the training phase, attackers might inject malicious data into the training dataset, which can subtly alter the model’s behavior. For example, an attacker could add images of cats labeled as dogs to a dataset used to train an image classification model. If done cleverly, the model’s accuracy might not significantly drop, and the attack could be noticed. The model would then incorrectly classify some cats as dogs, which could have consequences depending on the application.\nIn an embedded security camera system, for instance, this could allow an intruder to avoid detection by wearing a specific pattern that the model has been tricked into classifying as non-threatening.\nDuring the inference phase, attackers can use adversarial examples to fool the model. Adversarial examples are inputs that have been slightly altered in a way that causes the model to make incorrect predictions. For instance, an attacker might add a small amount of noise to an image in a way that causes a face recognition system to misidentify a person. These attacks can be particularly concerning in applications where safety is at stake, such as autonomous vehicles. In the example you mentioned, the researchers were able to cause a traffic sign recognition system to misclassify a stop sign as a speed sign. This type of misclassification could lead to accidents if it occurred in a real-world autonomous driving system.\nTo mitigate these risks, several defenses can be employed:\n\nData Validation and Sanitization: Before incorporating new data into the training dataset, it should be thoroughly validated and sanitized to ensure it is not malicious.\nAdversarial Training: The model can be trained on adversarial examples to make it more robust to these types of attacks.\nInput Validation: During inference, inputs should be validated to ensure they have not been manipulated to create adversarial examples.\nRegular Auditing and Monitoring: Regularly auditing and monitoring the model’s behavior can help detect and mitigate adversarial attacks. However, this is easier said than done in the context of tiny ML systems. It is often hard to monitor embedded ML systems at the endpoint due to communication bandwidth limitations, which we will discuss in the MLOps chapter.\n\nBy understanding the potential risks and implementing these defenses, we can help secure on-device training at the endpoint/edge and mitigate the impact of adversarial attacks. Most people easily confuse data poisoning and adversarial attacks. So Table 12.2 compares data poisoning and adversarial attacks:\n\n\n\nTable 12.2: Comparison of data poisoning and adversarial attacks.\n\n\n\n\n\n\n\n\n\n\nAspect\nData Poisoning\nAdversarial Attacks\n\n\n\n\nTiming\nTraining phase\nInference phase\n\n\nTarget\nTraining data\nInput data\n\n\nGoal\nNegatively affect model’s performance\nCause incorrect predictions\n\n\nMethod\nInsert malicious examples into training data, often with incorrect labels\nAdd carefully crafted noise to input data\n\n\nExample\nAdding images of cats labeled as dogs to a dataset used for training an image classification model\nAdding a small amount of noise to an image in a way that causes a face recognition system to misidentify a person\n\n\nPotential Effects\nModel learns incorrect patterns and makes incorrect predictions\nImmediate and potentially dangerous incorrect predictions\n\n\nApplications Affected\nAny ML model\nAutonomous vehicles, security systems, etc\n\n\n\n\n\n\n\n\n12.6.3 Model Inversion\nModel inversion attacks are a privacy threat to on-device machine learning models trained on sensitive user data (Nguyen et al. 2023). Understanding this attack vector and mitigation strategies will be important for building secure and ethical on-device AI. For example, imagine an iPhone app that uses on-device learning to categorize photos in your camera roll into groups like “beach,” “food,” or “selfies” for easier searching.\n\nNguyen, Ngoc-Bao, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and Ngai-Man Cheung. 2023. “Re-Thinking Model Inversion Attacks Against Deep Neural Networks.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16384–93. IEEE. https://doi.org/10.1109/cvpr52729.2023.01572.\nThe on-device model may be trained by Apple on a dataset of iCloud photos from consenting users. A malicious attacker could attempt to extract parts of those original iCloud training photos using model inversion. Specifically, the attacker feeds crafted synthetic inputs into the on-device photo classifier. By tweaking the synthetic inputs and observing how the model categorizes them, they can refine the inputs until they reconstruct copies of the original training data - like a beach photo from a user’s iCloud. Now, the attacker has breached that user’s privacy by obtaining one of their photos without consent. This demonstrates why model inversion is dangerous - it can potentially leak highly sensitive training data.\nPhotos are an especially high-risk data type because they often contain identifiable people, location information, and private moments. However, the same attack methodology could apply to other personal data, such as audio recordings, text messages, or users’ health data.\nTo defend against model inversion, one would need to take precautions like adding noise to the model outputs or using privacy-preserving machine learning techniques like federated learning to train the on-device model. The goal is to prevent attackers from being able to reconstruct the original training data.\n\n\n12.6.4 On-Device Learning Security Concerns\nWhile data poisoning and adversarial attacks are common concerns for ML models in general, on-device learning introduces unique security risks. When on-device variants of large-scale models are published, adversaries can exploit these smaller models to attack their larger counterparts. Research has demonstrated that as on-device models and full-scale models become more similar, the vulnerability of the original large-scale models increases significantly. For instance, evaluations across 19 Deep Neural Networks (DNNs) revealed that exploiting on-device models could increase the vulnerability of the original large-scale models by up to 100 times.\nThere are three primary types of security risks specific to on-device learning:\n\nTransfer-Based Attacks: These attacks exploit the transferability property between a surrogate model (an approximation of the target model, similar to an on-device model) and a remote target model (the original full-scale model). Attackers generate adversarial examples using the surrogate model, which can then be used to deceive the target model. For example, imagine an on-device model designed to identify spam emails. An attacker could use this model to generate a spam email that is not detected by the larger, full-scale filtering system.\nOptimization-Based Attacks: These attacks generate adversarial examples for transfer-based attacks using some form of the objective function and iteratively modify inputs to achieve the desired outcome. Gradient estimation attacks, for example, approximate the model’s gradient using query outputs (such as softmax confidence scores), while gradient-free attacks use the model’s final decision (the predicted class) to approximate the gradient, albeit requiring many more queries.\nQuery Attacks with Transfer Priors: These attacks combine elements of transfer-based and optimization-based attacks. They reverse engineer on-device models to serve as surrogates for the target full-scale model. In other words, attackers use the smaller on-device model to understand how the larger model works and then use this knowledge to attack the full-scale model.\n\nBy understanding these specific risks associated with on-device learning, we can develop more robust security protocols to protect both on-device and full-scale models from potential attacks.\n\n\n12.6.5 Mitigation of On-Device Learning Risks\nVarious methods can be employed to mitigate the numerous security risks associated with on-device learning. These methods may be specific to the type of attack or serve as a general tool to bolster security.\nOne strategy to reduce security risks is to diminish the similarity between on-device models and full-scale models, thereby reducing transferability by up to 90%. This method, known as similarity-unpairing, addresses the problem that arises when adversaries exploit the input-gradient similarity between the two models. By finetuning the full-scale model to create a new version with similar accuracy but different input gradients, we can construct the on-device model by quantizing this updated full-scale model. This unpairing reduces the vulnerability of on-device models by limiting the exposure of the original full-scale model. Importantly, the order of finetuning and quantization can be varied while still achieving risk mitigation (Hong, Carlini, and Kurakin 2023).\nTo tackle data poisoning, it is imperative to source datasets from trusted and reliable vendors.\nSeveral strategies can be employed to combat adversarial attacks. A proactive approach involves generating adversarial examples and incorporating them into the model’s training dataset, thereby fortifying the model against such attacks. Tools like CleverHans, an open-source training library, are instrumental in creating adversarial examples. Defense distillation is another effective strategy, wherein the on-device model outputs probabilities of different classifications rather than definitive decisions (Hong, Carlini, and Kurakin 2023), making it more challenging for adversarial examples to exploit the model.\n\nHong, Sanghyun, Nicholas Carlini, and Alexey Kurakin. 2023. “Publishing Efficient on-Device Models Increases Adversarial Vulnerability.” In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 271–90. IEEE; IEEE. https://doi.org/10.1109/satml54575.2023.00026.\nThe theft of intellectual property is another significant concern when deploying on-device models. Intellectual property theft is a concern when deploying on-device models, as adversaries may attempt to reverse-engineer the model to steal the underlying technology. To safeguard against intellectual property theft, the binary executable of the trained model should be stored on a microcontroller unit with encrypted software and secured physical interfaces of the chip. Furthermore, the final dataset used for training the model should be kept private.\nFurthermore, on-device models often utilize well-known or open-source datasets, such as MobileNet’s Visual Wake Words. As such, it is important to maintain the privacy of the final dataset used for training the model. Additionally, protecting the data augmentation process and incorporating specific use cases can minimize the risk of reverse-engineering an on-device model.\nLastly, the Adversarial Threat Landscape for Artificial Intelligence Systems (ATLAS) serves as a valuable matrix tool that helps assess the risk profile of on-device models, empowering developers to identify and mitigate potential risks proactively.\n\n\n12.6.6 Securing Training Data\nThere are various ways to secure on-device training data. Each concept is really deep and could be worth a class by itself. So here, we’ll briefly allude to those concepts so you’re aware of what to learn further.\n\nEncryption\nEncryption serves as the first line of defense for training data. This involves implementing end-to-end encryption for local storage on devices and communication channels to prevent unauthorized access to raw training data. Trusted execution environments, such as Intel SGX and ARM TrustZone, are essential for facilitating secure training on encrypted data.\nAdditionally, when aggregating updates from multiple devices, secure multi-party computation protocols can be employed to enhance security (Kairouz, Oh, and Viswanath 2015); a practical application of this is in collaborative on-device learning, where cryptographic privacy-preserving aggregation of user model updates can be implemented. This technique effectively hides individual user data even during the aggregation phase.\n\nKairouz, Peter, Sewoong Oh, and Pramod Viswanath. 2015. “Secure Multi-Party Differential Privacy.” In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, edited by Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, 2008–16. https://proceedings.neurips.cc/paper/2015/hash/a01610228fe998f515a72dd730294d87-Abstract.html.\n\n\nDifferential Privacy\nDifferential privacy is another crucial strategy for protecting training data. By injecting calibrated statistical noise into the data, we can mask individual records while still extracting valuable population patterns (Dwork and Roth 2013). Managing the privacy budget across multiple training iterations and reducing noise as the model converges is also vital (Abadi et al. 2016). Methods such as formally provable differential privacy, which may include adding Laplace or Gaussian noise scaled to the dataset’s sensitivity, can be employed.\n\nDwork, Cynthia, and Aaron Roth. 2013. “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science 9 (3-4): 211–407. https://doi.org/10.1561/0400000042.\n\nAbadi, Martin, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. “Deep Learning with Differential Privacy.” In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–18. CCS ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2976749.2978318.\n\n\nAnomaly Detection\nAnomaly detection plays an important role in identifying and mitigating potential data poisoning attacks. This can be achieved through statistical analyses like Principal Component Analysis (PCA) and clustering, which help to detect deviations in aggregated training data. Time-series methods such as Cumulative Sum (CUSUM) charts are useful for identifying shifts indicative of potential poisoning. Comparing current data distributions with previously seen clean data distributions can also help to flag anomalies. Moreover, suspected poisoned batches should be removed from the training update aggregation process. For example, spot checks on subsets of training images on devices can be conducted using photoDNA hashes to identify poisoned inputs.\n\n\nInput Data Validation\nLastly, input data validation is essential for ensuring the integrity and validity of input data before it is fed into the training model, thereby protecting against adversarial payloads. Similarity measures, such as cosine distance, can be employed to catch inputs that deviate significantly from the expected distribution. Suspicious inputs that may contain adversarial payloads should be quarantined and sanitized. Furthermore, parser access to training data should be restricted to validated code paths only. Leveraging hardware security features, such as ARM Pointer Authentication, can prevent memory corruption (ARM Limited, 2023). An example of this is implementing input integrity checks on audio training data used by smart speakers before processing by the speech recognition model (Z. Chen and Xu 2023).\n\nChen, Zhiyong, and Shugong Xu. 2023. “Learning Domain-Heterogeneous Speaker Recognition Systems with Personalized Continual Federated Learning.” EURASIP Journal on Audio, Speech, and Music Processing 2023 (1): 33. https://doi.org/10.1186/s13636-023-00299-2.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1331,7 +1331,7 @@ "section": "12.7 On-Device Training Frameworks", "text": "12.7 On-Device Training Frameworks\nEmbedded inference frameworks like TF-Lite Micro (David et al. 2021), TVM (T. Chen et al. 2018), and MCUNet (Lin et al. 2020) provide a slim runtime for running neural network models on microcontrollers and other resource-constrained devices. However, they don’t support on-device training. Training requires its own set of specialized tools due to the impact of quantization on gradient calculation and the memory footprint of backpropagation (Lin et al. 2022).\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite Micro: Embedded Machine Learning for Tinyml Systems.” Proceedings of Machine Learning and Systems 3: 800–811.\n\nChen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, et al. 2018. “TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.” In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 578–94.\n\nLin, Ji, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. “MCUNet: Tiny Deep Learning on IoT Devices.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html.\n\nLin, Ji, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song Han. 2022. “On-Device Training Under 256kb Memory.” Adv. Neur. In. 35: 22941–54.\nIn recent years, a handful of tools and frameworks have started to emerge that enable on-device training. These include Tiny Training Engine (Lin et al. 2022), TinyTL (Cai et al. 2020), and TinyTrain (Kwon et al. 2023).\n\n12.7.1 Tiny Training Engine\nTiny Training Engine (TTE) uses several techniques to optimize memory usage and speed up the training process. An overview of the TTE workflow is shown in Figure 12.9. First, TTE offloads the automatic differentiation to compile time instead of runtime, significantly reducing overhead during training. Second, TTE performs graph optimization like pruning and sparse updates to reduce memory requirements and accelerate computations.\n\n\n\n\n\n\nFigure 12.9: TTE workflow.\n\n\n\nSpecifically, TTE follows four main steps:\n\nDuring compile time, TTE traces the forward propagation graph and derives the corresponding backward graph for backpropagation. This allows differentiation to happen at compile time rather than runtime.\nTTE prunes any nodes representing frozen weights from the backward graph. Frozen weights are weights that are not updated during training to reduce certain neurons’ impact. Pruning their nodes saves memory.\nTTE reorders the gradient descent operators to interleave them with the backward pass computations. This scheduling minimizes memory footprints.\nTTE uses code generation to compile the optimized forward and backward graphs, which are then deployed for on-device training.\n\n\n\n12.7.2 Tiny Transfer Learning\nTiny Transfer Learning (TinyTL) enables memory-efficient on-device training through a technique called weight freezing. During training, much of the memory bottleneck comes from storing intermediate activations and updating the weights in the neural network.\nTo reduce this memory overhead, TinyTL freezes the majority of the weights so they do not need to be updated during training. This eliminates the need to store intermediate activations for frozen parts of the network. TinyTL only finetunes the bias terms, which are much smaller than the weights. An overview of TinyTL workflow is shown in Figure 12.10.\n\n\n\n\n\n\nFigure 12.10: TinyTL workflow. Credit: Cai et al. (2020).)\n\n\nCai, Han, Chuang Gan, Ligeng Zhu, and Song Han. 2020. “TinyTL: Reduce Memory, Not Parameters for Efficient on-Device Learning.” In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/81f7acabd411274fcf65ce2070ed568a-Abstract.html.\n\n\nFreezing weights apply to fully connected layers as well as convolutional and normalization layers. However, only adapting the biases limits the model’s ability to learn and adapt to new data.\nTo increase adaptability without much additional memory, TinyTL uses a small residual learning model. This refines the intermediate feature maps to produce better outputs, even with fixed weights. The residual model introduces minimal overhead - less than 3.8% on top of the base model.\nBy freezing most weights, TinyTL significantly reduces memory usage during on-device training. The residual model then allows it to adapt and learn effectively for the task. The combined approach provides memory-efficient on-device training with minimal impact on model accuracy.\n\n\n12.7.3 Tiny Train\nTinyTrain significantly reduces the time required for on-device training by selectively updating only certain parts of the model. It does this using a technique called task-adaptive sparse updating, as shown in Figure 12.11.\nBased on the user data, memory, and computing available on the device, TinyTrain dynamically chooses which neural network layers to update during training. This layer selection is optimized to reduce computation and memory usage while maintaining high accuracy.\n\n\n\n\n\n\nFigure 12.11: TinyTrain workflow. Credit: Kwon et al. (2023).\n\n\nKwon, Young D, Rui Li, Stylianos I Venieris, Jagmohan Chauhan, Nicholas D Lane, and Cecilia Mascolo. 2023. “TinyTrain: Deep Neural Network Training at the Extreme Edge.” ArXiv Preprint abs/2307.09988. https://arxiv.org/abs/2307.09988.\n\n\nMore specifically, TinyTrain first does offline pretraining of the model. During pretraining, it not only trains the model on the task data but also meta-trains the model. Meta-training means training the model on metadata about the training process itself. This meta-learning improves the model’s ability to adapt accurately even when limited data is available for the target task.\nThen, during the online adaptation stage, when the model is being customized on the device, TinyTrain performs task-adaptive sparse updates. Using the criteria around the device’s capabilities, it selects only certain layers to update through backpropagation. The layers are chosen to balance accuracy, memory usage, and computation time.\nBy sparsely updating layers tailored to the device and task, TinyTrain significantly reduces on-device training time and resource usage. The offline meta-training also improves accuracy when adapting to limited data. Together, these methods enable fast, efficient, and accurate on-device training.\n\n\n12.7.4 Comparison\nHere is a table summarizing the key similarities and differences between the Tiny Training Engine, TinyTL, and TinyTrain frameworks:\n\n\n\n\n\n\n\n\nFramework\nSimilarities\nDifferences\n\n\n\n\nTiny Training Engine\nOn-device training Optimize memory & computation Leverage pruning, sparsity, etc\nTraces forward & backward graphs Prunes frozen weights Interleaves backprop & gradients Code generation\n\n\nTinyTL\nOn-device training Optimize memory & computation Leverage freezing, sparsity, etc\nFreezes most weights Only adapts biases Uses residual model\n\n\nTinyTrain\nOn-device training Optimize memory & computation Leverage sparsity, etc\nMeta-training in pretraining Task-adaptive sparse updating Selective layer updating", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1342,7 +1342,7 @@ "section": "12.8 Conclusion", "text": "12.8 Conclusion\nThe concept of on-device learning is increasingly important for increasing the usability and scalability of TinyML. This chapter explored the intricacies of on-device learning, exploring its advantages and limitations, adaptation strategies, key related algorithms and techniques, security implications, and existing and emerging on-device training frameworks.\nOn-device learning is, undoubtedly, a groundbreaking paradigm that brings forth numerous advantages for embedded and edge ML deployments. By performing training directly on the endpoint devices, on-device learning obviates the need for continuous cloud connectivity, making it particularly well-suited for IoT and edge computing applications. It comes with benefits such as improved privacy, ease of compliance, and resource efficiency. At the same time, on-device learning faces limitations related to hardware constraints, limited data size, and reduced model accuracy and generalization.\nMechanisms such as reduced model complexity, optimization and data compression techniques, and related learning methods such as transfer learning and federated learning allow models to adapt to learn and evolve under resource constraints, thus serving as the bedrock for effective ML on edge devices.\nThe critical security concerns in on-device learning highlighted in this chapter, ranging from data poisoning and adversarial attacks to specific risks introduced by on-device learning, must be addressed in real workloads for on-device learning to be a viable paradigm. Effective mitigation strategies, such as data validation, encryption, differential privacy, anomaly detection, and input data validation, are crucial to safeguard on-device learning systems from these threats.\nThe emergence of specialized on-device training frameworks like Tiny Training Engine, Tiny Transfer Learning, and Tiny Train presents practical tools to enable efficient on-device training. These frameworks employ various techniques to optimize memory usage, reduce computational overhead, and streamline the on-device training process.\nIn conclusion, on-device learning stands at the forefront of TinyML, promising a future where models can autonomously acquire knowledge and adapt to changing environments on edge devices. The application of on-device learning has the potential to revolutionize various domains, including healthcare, industrial IoT, and smart cities. However, the transformative potential of on-device learning must be balanced with robust security measures to protect against data breaches and adversarial threats. Embracing innovative on-device training frameworks and implementing stringent security protocols are key steps in unlocking the full potential of on-device learning. As this technology continues to evolve, it holds the promise of making our devices smarter, more responsive, and better integrated into our daily lives.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1351,9 +1351,9 @@ "href": "contents/ondevice_learning/ondevice_learning.html#sec-on-device-learning-resource", "title": "12  On-Device Learning", "section": "12.9 Resources", - "text": "12.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nIntro to TensorFlow Lite (TFLite).\nTFLite Optimization and Quantization.\nTFLite Quantization-Aware Training.\nTransfer Learning:\n\nTransfer Learning: with Visual Wake Words example.\nOn-device Training and Transfer Learning.\n\nDistributed Training:\n\nDistributed Training.\nDistributed Training.\n\nContinuous Monitoring:\n\nContinuous Evaluation Challenges for TinyML.\nFederated Learning Challenges.\nContinuous Monitoring with Federated ML.\nContinuous Monitoring Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 12.1\nExercise 12.2\nExercise 12.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "12.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nIntro to TensorFlow Lite (TFLite).\nTFLite Optimization and Quantization.\nTFLite Quantization-Aware Training.\nTransfer Learning:\n\nTransfer Learning: with Visual Wake Words example.\nOn-device Training and Transfer Learning.\n\nDistributed Training:\n\nDistributed Training.\nDistributed Training.\n\nContinuous Monitoring:\n\nContinuous Evaluation Challenges for TinyML.\nFederated Learning Challenges.\nContinuous Monitoring with Federated ML.\nContinuous Monitoring Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 12.1\nVideo 12.2\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 12.1\nExercise 12.2\nExercise 12.3\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Deployment", "12  On-Device Learning" ] }, @@ -1364,7 +1364,7 @@ "section": "", "text": "13.1 Introduction\nMachine Learning Operations (MLOps) is a systematic approach that combines machine learning (ML), data science, and software engineering to automate the end-to-end ML lifecycle. This includes everything from data preparation and model training to deployment and maintenance. MLOps ensures that ML models are developed, deployed, and maintained efficiently and effectively.\nLet’s start by taking a general example (i.e., non-edge ML) case. Consider a ridesharing company that wants to deploy a machine-learning model to predict real-time rider demand. The data science team spends months developing a model, but when it’s time to deploy, they realize it needs to be compatible with the engineering team’s production environment. Deploying the model requires rebuilding it from scratch, which costs weeks of additional work. This is where MLOps comes in.\nWith MLOps, protocols, and tools, the model developed by the data science team can be seamlessly deployed and integrated into the production environment. In essence, MLOps removes friction during the development, deployment, and maintenance of ML systems. It improves collaboration between teams through defined workflows and interfaces. MLOps also accelerates iteration speed by enabling continuous delivery for ML models.\nFor the ridesharing company, implementing MLOps means their demand prediction model can be frequently retrained and deployed based on new incoming data. This keeps the model accurate despite changing rider behavior. MLOps also allows the company to experiment with new modeling techniques since models can be quickly tested and updated.\nOther MLOps benefits include enhanced model lineage tracking, reproducibility, and auditing. Cataloging ML workflows and standardizing artifacts - such as logging model versions, tracking data lineage, and packaging models and parameters - enables deeper insight into model provenance. Standardizing these artifacts facilitates tracing a model back to its origins, replicating the model development process, and examining how a model version has changed over time. This also facilitates regulation compliance, which is especially critical in regulated industries like healthcare and finance, where being able to audit and explain models is important.\nMajor organizations adopt MLOps to boost productivity, increase collaboration, and accelerate ML outcomes. It provides the frameworks, tools, and best practices to effectively manage ML systems throughout their lifecycle. This results in better-performing models, faster time-to-value, and sustained competitive advantage. As we explore MLOps further, consider how implementing these practices can help address embedded ML challenges today and in the future.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1373,9 +1373,9 @@ "href": "contents/ops/ops.html#historical-context", "title": "13  ML Operations", "section": "13.2 Historical Context", - "text": "13.2 Historical Context\nMLOps has its roots in DevOps, a set of practices combining software development (Dev) and IT operations (Ops) to shorten the development lifecycle and provide continuous delivery of high-quality software. The parallels between MLOps and DevOps are evident in their focus on automation, collaboration, and continuous improvement. In both cases, the goal is to break down silos between different teams (developers, operations, and, in the case of MLOps, data scientists and ML engineers) and to create a more streamlined and efficient process. It is useful to understand the history of this evolution better to understand MLOps in the context of traditional systems.\n\n13.2.1 DevOps\nThe term “DevOps” was first coined in 2009 by Patrick Debois, a consultant and Agile practitioner. Debois organized the first DevOpsDays conference in Ghent, Belgium, in 2009. The conference brought together development and operations professionals to discuss ways to improve collaboration and automate processes.\nDevOps has its roots in the Agile movement, which began in the early 2000s. Agile provided the foundation for a more collaborative approach to software development and emphasized small, iterative releases. However, Agile primarily focuses on collaboration between development teams. As Agile methodologies became more popular, organizations realized the need to extend this collaboration to operations teams.\nThe siloed nature of development and operations teams often led to inefficiencies, conflicts, and delays in software delivery. This need for better collaboration and integration between these teams led to the DevOps movement. DevOps can be seen as an extension of the Agile principles, including operations teams.\nThe key principles of DevOps include collaboration, automation, continuous integration, delivery, and feedback. DevOps focuses on automating the entire software delivery pipeline, from development to deployment. It aims to improve the collaboration between development and operations teams, utilizing tools like Jenkins, Docker, and Kubernetes to streamline the development lifecycle.\nWhile Agile and DevOps share common principles around collaboration and feedback, DevOps specifically targets integrating development and IT operations - expanding Agile beyond just development teams. It introduces practices and tools to automate software delivery and enhance the speed and quality of software releases.\n\n\n13.2.2 MLOps\nMLOps, on the other hand, stands for MLOps, and it extends the principles of DevOps to the ML lifecycle. MLOps aims to automate and streamline the end-to-end ML lifecycle, from data preparation and model development to deployment and monitoring. The main focus of MLOps is to facilitate collaboration between data scientists, data engineers, and IT operations and to automate the deployment, monitoring, and management of ML models. Some key factors led to the rise of MLOps.\n\nData drift: Data drift degrades model performance over time, motivating the need for rigorous monitoring and automated retraining procedures provided by MLOps.\nReproducibility: The lack of reproducibility in machine learning experiments motivated MLOps systems to track code, data, and environment variables to enable reproducible ML workflows.\nExplainability: The black box nature and lack of explainability of complex models motivated the need for MLOps capabilities to increase model transparency and explainability.\nMonitoring: The inability to reliably monitor model performance post-deployment highlighted the need for MLOps solutions with robust model performance instrumentation and alerting.\nFriction: The friction in manually retraining and deploying models motivated the need for MLOps systems that automate machine learning deployment pipelines.\nOptimization: The complexity of configuring machine learning infrastructure motivated the need for MLOps platforms with optimized, ready-made ML infrastructure.\n\nWhile DevOps and MLOps share the common goal of automating and streamlining processes, their focus and challenges differ. DevOps primarily deals with the challenges of software development and IT operations. In contrast, MLOps deals with the additional complexities of managing ML models, such as data versioning, model versioning, and model monitoring. MLOps also requires stakeholder collaboration, including data scientists, engineers, and IT operations.\nWhile DevOps and MLOps share similarities in their goals and principles, they differ in their focus and challenges. DevOps focuses on improving the collaboration between development and operations teams and automating software delivery. In contrast, MLOps focuses on streamlining and automating the ML lifecycle and facilitating collaboration between data scientists, data engineers, and IT operations.\nTable 13.1 compares and summarizes them side by side.\n\n\n\nTable 13.1: Comparison of DevOps and MLOps.\n\n\n\n\n\n\n\n\n\n\nAspect\nDevOps\nMLOps\n\n\n\n\nObjective\nStreamlining software development and operations processes\nOptimizing the lifecycle of machine learning models\n\n\nMethodology\nContinuous Integration and Continuous Delivery (CI/CD) for software development\nSimilar to CI/CD but focuses on machine learning workflows\n\n\nPrimary Tools\nVersion control (Git), CI/CD tools (Jenkins, Travis CI), Configuration management (Ansible, Puppet)\nData versioning tools, Model training and deployment tools, CI/CD pipelines tailored for ML\n\n\nPrimary Concerns\nCode integration, Testing, Release management, Automation, Infrastructure as code\nData management, Model versioning, Experiment tracking, Model deployment, Scalability of ML workflows\n\n\nTypical Outcomes\nFaster and more reliable software releases, Improved collaboration between development and operations teams\nEfficient management and deployment of machine learning models, Enhanced collaboration between data scientists and engineers\n\n\n\n\n\n\nLearn more about ML Lifecycles through a case study featuring speech recognition.", + "text": "13.2 Historical Context\nMLOps has its roots in DevOps, a set of practices combining software development (Dev) and IT operations (Ops) to shorten the development lifecycle and provide continuous delivery of high-quality software. The parallels between MLOps and DevOps are evident in their focus on automation, collaboration, and continuous improvement. In both cases, the goal is to break down silos between different teams (developers, operations, and, in the case of MLOps, data scientists and ML engineers) and to create a more streamlined and efficient process. It is useful to understand the history of this evolution better to understand MLOps in the context of traditional systems.\n\n13.2.1 DevOps\nThe term “DevOps” was first coined in 2009 by Patrick Debois, a consultant and Agile practitioner. Debois organized the first DevOpsDays conference in Ghent, Belgium, in 2009. The conference brought together development and operations professionals to discuss ways to improve collaboration and automate processes.\nDevOps has its roots in the Agile movement, which began in the early 2000s. Agile provided the foundation for a more collaborative approach to software development and emphasized small, iterative releases. However, Agile primarily focuses on collaboration between development teams. As Agile methodologies became more popular, organizations realized the need to extend this collaboration to operations teams.\nThe siloed nature of development and operations teams often led to inefficiencies, conflicts, and delays in software delivery. This need for better collaboration and integration between these teams led to the DevOps movement. DevOps can be seen as an extension of the Agile principles, including operations teams.\nThe key principles of DevOps include collaboration, automation, continuous integration, delivery, and feedback. DevOps focuses on automating the entire software delivery pipeline, from development to deployment. It aims to improve the collaboration between development and operations teams, utilizing tools like Jenkins, Docker, and Kubernetes to streamline the development lifecycle.\nWhile Agile and DevOps share common principles around collaboration and feedback, DevOps specifically targets integrating development and IT operations - expanding Agile beyond just development teams. It introduces practices and tools to automate software delivery and enhance the speed and quality of software releases.\n\n\n13.2.2 MLOps\nMLOps, on the other hand, stands for MLOps, and it extends the principles of DevOps to the ML lifecycle. MLOps aims to automate and streamline the end-to-end ML lifecycle, from data preparation and model development to deployment and monitoring. The main focus of MLOps is to facilitate collaboration between data scientists, data engineers, and IT operations and to automate the deployment, monitoring, and management of ML models. Some key factors led to the rise of MLOps.\n\nData drift: Data drift degrades model performance over time, motivating the need for rigorous monitoring and automated retraining procedures provided by MLOps.\nReproducibility: The lack of reproducibility in machine learning experiments motivated MLOps systems to track code, data, and environment variables to enable reproducible ML workflows.\nExplainability: The black box nature and lack of explainability of complex models motivated the need for MLOps capabilities to increase model transparency and explainability.\nMonitoring: The inability to reliably monitor model performance post-deployment highlighted the need for MLOps solutions with robust model performance instrumentation and alerting.\nFriction: The friction in manually retraining and deploying models motivated the need for MLOps systems that automate machine learning deployment pipelines.\nOptimization: The complexity of configuring machine learning infrastructure motivated the need for MLOps platforms with optimized, ready-made ML infrastructure.\n\nWhile DevOps and MLOps share the common goal of automating and streamlining processes, their focus and challenges differ. DevOps primarily deals with the challenges of software development and IT operations. In contrast, MLOps deals with the additional complexities of managing ML models, such as data versioning, model versioning, and model monitoring. MLOps also requires stakeholder collaboration, including data scientists, engineers, and IT operations.\nWhile DevOps and MLOps share similarities in their goals and principles, they differ in their focus and challenges. DevOps focuses on improving the collaboration between development and operations teams and automating software delivery. In contrast, MLOps focuses on streamlining and automating the ML lifecycle and facilitating collaboration between data scientists, data engineers, and IT operations.\nTable 13.1 compares and summarizes them side by side.\n\n\n\nTable 13.1: Comparison of DevOps and MLOps.\n\n\n\n\n\n\n\n\n\n\nAspect\nDevOps\nMLOps\n\n\n\n\nObjective\nStreamlining software development and operations processes\nOptimizing the lifecycle of machine learning models\n\n\nMethodology\nContinuous Integration and Continuous Delivery (CI/CD) for software development\nSimilar to CI/CD but focuses on machine learning workflows\n\n\nPrimary Tools\nVersion control (Git), CI/CD tools (Jenkins, Travis CI), Configuration management (Ansible, Puppet)\nData versioning tools, Model training and deployment tools, CI/CD pipelines tailored for ML\n\n\nPrimary Concerns\nCode integration, Testing, Release management, Automation, Infrastructure as code\nData management, Model versioning, Experiment tracking, Model deployment, Scalability of ML workflows\n\n\nTypical Outcomes\nFaster and more reliable software releases, Improved collaboration between development and operations teams\nEfficient management and deployment of machine learning models, Enhanced collaboration between data scientists and engineers\n\n\n\n\n\n\nLearn more about ML Lifecycles through a case study featuring speech recognition in Video 13.1.\n\n\n\n\n\n\nVideo 13.1: MLOps", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1384,9 +1384,9 @@ "href": "contents/ops/ops.html#key-components-of-mlops", "title": "13  ML Operations", "section": "13.3 Key Components of MLOps", - "text": "13.3 Key Components of MLOps\nIn this chapter, we will provide an overview of the core components of MLOps, an emerging set of practices that enables robust delivery and lifecycle management of ML models in production. While some MLOps elements like automation and monitoring were covered in previous chapters, we will integrate them into an integrated framework and expand on additional capabilities like governance. Additionally, we will describe and link to popular tools used within each component, such as LabelStudio for data labeling. By the end, we hope that you will understand the end-to-end MLOps methodology that takes models from ideation to sustainable value creation within organizations.\n\n13.3.1 Data Management\nRobust data management and data engineering actively empower successful MLOps implementations. Teams properly ingest, store, and prepare raw data from sensors, databases, apps, and other systems for model training and deployment.\nTeams actively track changes to datasets over time using version control with Git and tools like GitHub or GitLab. Data scientists collaborate on curating datasets by merging changes from multiple contributors. Teams can review or roll back each iteration of a dataset if needed.\nTeams meticulously label and annotate data using labeling software like LabelStudio, which enables distributed teams to work on tagging datasets together. As the target variables and labeling conventions evolve, teams maintain accessibility to earlier versions.\nTeams store the raw dataset and all derived assets on cloud storage services like Amazon S3 or Google Cloud Storage. These services provide scalable, resilient storage with versioning capabilities. Teams can set granular access permissions.\nRobust data pipelines created by teams automate raw data extraction, joining, cleansing, and transformation into analysis-ready datasets. Prefect, Apache Airflow, and dbt are workflow orchestrators that allow engineers to develop flexible, reusable data processing pipelines.\nFor instance, a pipeline may ingest data from PostgreSQL databases, REST APIs, and CSVs stored on S3. It can filter, deduplicate, and aggregate the data, handle errors, and save the output to S3. The pipeline can also push the transformed data into a feature store like Tecton or Feast for low-latency access.\nIn an industrial predictive maintenance use case, sensor data is ingested from devices into S3. A perfect pipeline processes the sensor data, joining it with maintenance records. The enriched dataset is stored in Feast so models can easily retrieve the latest data for training and predictions.\nThe video below is a short overview of data pipelines.\n\n\n\n13.3.2 CI/CD Pipelines\nContinuous integration and continuous delivery (CI/CD) pipelines actively automate the progression of ML models from initial development into production deployment. Adapted for ML systems, CI/CD principles empower teams to rapidly and robustly deliver new models with minimized manual errors.\nCI/CD pipelines orchestrate key steps, including checking out new code changes, transforming data, training and registering new models, validation testing, containerization, deploying to environments like staging clusters, and promoting to production. Teams leverage popular CI/CD solutions like Jenkins, CircleCI and GitHub Actions to execute these MLOps pipelines, while Prefect, Metaflow and Kubeflow offer ML-focused options.\nFigure 13.1 illustrates a CI/CD pipeline specifically tailored for MLOps. The process starts with a dataset and feature repository (on the left), which feeds into a dataset ingestion stage. Post-ingestion, the data undergoes validation to ensure its quality before being transformed for training. Parallel to this, a retraining trigger can initiate the pipeline based on specified criteria. The data then passes through a model training/tuning phase within a data processing engine, followed by model evaluation and validation. Once validated, the model is registered and stored in a machine learning metadata and artifact repository. The final stage involves deploying the trained model back into the dataset and feature repository, thereby creating a cyclical process for continuous improvement and deployment of machine learning models.\n\n\n\n\n\n\nFigure 13.1: MLOps CI/CD diagram. Credit: HarvardX.\n\n\n\nFor example, when a data scientist checks improvements to an image classification model into a GitHub repository, this actively triggers a Jenkins CI/CD pipeline. The pipeline reruns data transformations and model training on the latest data, tracking experiments with MLflow. After automated validation testing, teams deploy the model container to a Kubernetes staging cluster for further QA. Once approved, Jenkins facilitates a phased rollout of the model to production with canary deployments to catch any issues. If anomalies are detected, the pipeline enables teams to roll back to the previous model version gracefully.\nCI/CD pipelines empower teams to iterate and deliver ML models rapidly by connecting the disparate steps from development to deployment under continuous automation. Integrating MLOps tools like MLflow enhances model packaging, versioning, and pipeline traceability. CI/CD is integral for progressing models beyond prototypes into sustainable business systems.\n\n\n13.3.3 Model Training\nIn the model training phase, data scientists actively experiment with different ML architectures and algorithms to create optimized models that extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible.\nModern ML frameworks like TensorFlow, PyTorch and Keras provide pre-built components that simplify designing neural networks and other model architectures. Data scientists leverage built-in modules for layers, activations, losses, etc., and high-level APIs like Keras to focus more on model architecture.\nMLOps enables teams to package model training code into reusable, tracked scripts and notebooks. As models are developed, capabilities like hyperparameter tuning, neural architecture search and automatic feature selection rapidly iterate to find the best-performing configurations.\nTeams use Git to version control training code and host it in repositories like GitHub to track changes over time. This allows seamless collaboration between data scientists.\nNotebooks like Jupyter create an excellent interactive model development environment. The notebooks contain data ingestion, preprocessing, model declaration, training loop, evaluation, and export code in one reproducible document.\nFinally, teams orchestrate model training as part of a CI/CD pipeline for automation. For instance, a Jenkins pipeline can trigger a Python script to load new training data, retrain a TensorFlow classifier, evaluate model metrics, and automatically register the model if performance thresholds are met.\nAn example workflow has a data scientist using a PyTorch notebook to develop a CNN model for image classification. The fastai library provides high-level APIs to simplify training CNNs on image datasets. The notebook trains the model on sample data, evaluates accuracy metrics, and tunes hyperparameters like learning rate and layers to optimize performance. This reproducible notebook is version-controlled and integrated into a retraining pipeline.\nAutomating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed to produce ML systems.\n\n\n13.3.4 Model Evaluation\nBefore deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing, and canary testing.\nTeams typically evaluate models against holdout test datasets that are not used during training. The test data originates from the same distribution as production data. Teams calculate metrics like accuracy, AUC, precision, recall, and F1 score.\nTeams also track the same metrics over time against test data samples. If evaluation data comes from live production streams, this catches data drifts that degrade model performance over time.\nHuman oversight for model release remains important. Data scientists review performance across key segments and slices. Error analysis helps identify model weaknesses to guide enhancement. Teams apply fairness and bias detection techniques.\nCanary testing releases a model to a small subset of users to evaluate real-world performance before wide deployment. Teams incrementally route traffic to the canary release while monitoring for issues.\nFor example, a retailer evaluates a personalized product recommendation model against historical test data, reviewing accuracy and diversity metrics. Teams also calculate metrics on live customer data over time, detecting decreased accuracy over the last 2 weeks. Before full rollout, the new model is released to 5% of web traffic to ensure no degradation.\nAutomating evaluation and canary releases reduces deployment risks. However, human review still needs to be more critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production.\n\n\n13.3.5 Model Deployment\nTeams need to properly package, test, and track ML models to reliably deploy them to production. MLOps introduces frameworks and procedures for actively versioning, deploying, monitoring, and updating models in sustainable ways.\nTeams containerize models using Docker, which bundles code, libraries, and dependencies into a standardized unit. Containers enable smooth portability across environments.\nFrameworks like TensorFlow Serving and BentoML help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring.\nTeams first deploy updated models to staging or QA environments for testing before full production rollout. Shadow or canary deployments route a sample of traffic to test model variants. Teams incrementally increase access to new models.\nTeams build robust rollback procedures in case issues emerge. Rollbacks revert to the last known good model version. Integration with CI/CD pipelines simplifies redeployment if needed.\nTeams carefully track model artifacts, such as scripts, weights, logs, and metrics, for each version with ML metadata tools like MLflow. This maintains lineage and auditability.\nFor example, a retailer containerizes a product recommendation model in TensorFlow Serving and deploys it to a Kubernetes staging cluster. After monitoring and approving performance on sample traffic, Kubernetes shifts 10% of production traffic to the new model. If no issues are detected after a few days, the new model takes over 100% of traffic. However, teams should keep the previous version accessible for rollback if needed.\nModel deployment processes enable teams to make ML systems resilient in production by accounting for all transition states.\n\n\n13.3.6 Infrastructure Management\nMLOps teams heavily leverage infrastructure as code (IaC) tools and robust cloud architectures to actively manage the resources needed for development, training, and deployment of ML systems.\nTeams use IaC tools like Terraform, CloudFormation and Ansible to programmatically define, provision and update infrastructure in a version controlled manner. For MLOps, teams widely use Terraform to spin up resources on AWS, GCP and Azure.\nFor model building and training, teams dynamically provision computing resources like GPU servers, container clusters, storage, and databases through Terraform as needed by data scientists. Code encapsulates and preserves infrastructure definitions.\nContainers and orchestrators like Docker and Kubernetes allow teams to package models and reliably deploy them across different environments. Containers can be predictably spun up or down automatically based on demand.\nBy leveraging cloud elasticity, teams scale resources up and down to meet spikes in workloads like hyperparameter tuning jobs or spikes in prediction requests. Auto-scaling enables optimized cost efficiency.\nInfrastructure spans on-prem, cloud, and edge devices. A robust technology stack provides flexibility and resilience. Monitoring tools allow teams to observe resource utilization.\nFor example, a Terraform config may deploy a GCP Kubernetes cluster to host trained TensorFlow models exposed as prediction microservices. The cluster scales up pods to handle increased traffic. CI/CD integration seamlessly rolls out new model containers.\nCarefully managing infrastructure through IaC and monitoring enables teams to prevent bottlenecks in operationalizing ML systems at scale.\n\n\n13.3.7 Monitoring\nMLOps teams actively maintain robust monitoring to sustain visibility into ML models deployed in production. Continuous monitoring provides insights into model and system performance so teams can rapidly detect and address issues to minimize disruption.\nTeams actively monitor key model aspects, including analyzing samples of live predictions to track metrics like accuracy and confusion matrix over time.\nWhen monitoring performance, teams must profile incoming data to check for model drift - a steady decline in model accuracy after production deployment. Model drift can occur in two ways: concept drift and data drift. Concept drift refers to a fundamental change observed in the relationship between the input data and the target outcomes. For instance, as the COVID-19 pandemic progressed, e-commerce and retail sites had to correct their model recommendations since purchase data was overwhelmingly skewed towards items like hand sanitizer. Data drift describes changes in the distribution of data over time. For example, image recognition algorithms used in self-driving cars must account for seasonality in observing their surroundings. Teams also track application performance metrics like latency and errors for model integrations.\nFrom an infrastructure perspective, teams monitor for capacity issues like high CPU, memory, and disk utilization and system outages. Tools like Prometheus, Grafana, and Elastic enable teams to actively collect, analyze, query, and visualize diverse monitoring metrics. Dashboards make dynamics highly visible.\nTeams configure alerting for key monitoring metrics like accuracy declines and system faults to enable proactively responding to events that threaten reliability. For example, drops in model accuracy trigger alerts for teams to investigate potential data drift and retrain models using updated, representative data samples.\nAfter deployment, comprehensive monitoring enables teams to maintain confidence in model and system health. It empowers teams to catch and resolve deviations preemptively through data-driven alerts and dashboards. Active monitoring is essential for maintaining highly available, trustworthy ML systems.\nWatch the video below to learn more about monitoring.\n\n\n\n13.3.8 Governance\nMLOps teams actively establish proper governance practices as a critical component. Governance provides oversight into ML models to ensure they are trustworthy, ethical, and compliant. Without governance, significant risks exist of models behaving in dangerous or prohibited ways when deployed in applications and business processes.\nMLOps governance employs techniques to provide transparency into model predictions, performance, and behavior throughout the ML lifecycle. Explainability methods like SHAP and LIME help auditors understand why models make certain predictions by highlighting influential input features behind decisions. Bias detection analyzes model performance across different demographic groups defined by attributes like age, gender, and ethnicity to detect any systematic skews. Teams perform rigorous testing procedures on representative datasets to validate model performance before deployment.\nOnce in production, teams monitor concept drift to determine whether predictive relationships change over time in ways that degrade model accuracy. Teams also analyze production logs to uncover patterns in the types of errors models generate. Documentation about data provenance, development procedures, and evaluation metrics provides additional visibility.\nPlatforms like Watson OpenScale incorporate governance capabilities like bias monitoring and explainability directly into model building, testing, and production monitoring. The key focus areas of governance are transparency, fairness, and compliance. This minimizes the risks of models behaving incorrectly or dangerously when integrated into business processes. Embedding governance practices into MLOps workflows enables teams to ensure trustworthy AI.\n\n\n13.3.9 Communication & Collaboration\nMLOps actively breaks down silos and enables the free flow of information and insights between teams through all ML lifecycle stages. Tools like MLflow, Weights & Biases, and data contexts provide traceability and visibility to improve collaboration.\nTeams use MLflow to systematize tracking of model experiments, versions, and artifacts. Experiments can be programmatically logged from data science notebooks and training jobs. The model registry provides a central hub for teams to store production-ready models before deployment, with metadata like descriptions, metrics, tags, and lineage. Integrations with Github, GitLab facilitate code change triggers.\nWeights & Biases provides collaborative tools tailored to ML teams. Data scientists log experiments, visualize metrics like loss curves, and share experimentation insights with colleagues. Comparison dashboards highlight model differences. Teams discuss progress and next steps.\nEstablishing shared data contexts—glossaries, data dictionaries, and schema references—ensures alignment on data meaning and usage across roles. Documentation aids understanding for those without direct data access.\nFor example, a data scientist may use Weights & Biases to analyze an anomaly detection model experiment and share the evaluation results with other team members to discuss improvements. The final model can then be registered with MLflow before handing off for deployment.\nEnabling transparency, traceability, and communication via MLOps empowers teams to remove bottlenecks and accelerate the delivery of impactful ML systems.\nThe following video covers key challenges in model deployment, including concept drift, model drift, and software engineering issues.", + "text": "13.3 Key Components of MLOps\nIn this chapter, we will provide an overview of the core components of MLOps, an emerging set of practices that enables robust delivery and lifecycle management of ML models in production. While some MLOps elements like automation and monitoring were covered in previous chapters, we will integrate them into an integrated framework and expand on additional capabilities like governance. Additionally, we will describe and link to popular tools used within each component, such as LabelStudio for data labeling. By the end, we hope that you will understand the end-to-end MLOps methodology that takes models from ideation to sustainable value creation within organizations.\n\n13.3.1 Data Management\nRobust data management and data engineering actively empower successful MLOps implementations. Teams properly ingest, store, and prepare raw data from sensors, databases, apps, and other systems for model training and deployment.\nTeams actively track changes to datasets over time using version control with Git and tools like GitHub or GitLab. Data scientists collaborate on curating datasets by merging changes from multiple contributors. Teams can review or roll back each iteration of a dataset if needed.\nTeams meticulously label and annotate data using labeling software like LabelStudio, which enables distributed teams to work on tagging datasets together. As the target variables and labeling conventions evolve, teams maintain accessibility to earlier versions.\nTeams store the raw dataset and all derived assets on cloud storage services like Amazon S3 or Google Cloud Storage. These services provide scalable, resilient storage with versioning capabilities. Teams can set granular access permissions.\nRobust data pipelines created by teams automate raw data extraction, joining, cleansing, and transformation into analysis-ready datasets. Prefect, Apache Airflow, and dbt are workflow orchestrators that allow engineers to develop flexible, reusable data processing pipelines.\nFor instance, a pipeline may ingest data from PostgreSQL databases, REST APIs, and CSVs stored on S3. It can filter, deduplicate, and aggregate the data, handle errors, and save the output to S3. The pipeline can also push the transformed data into a feature store like Tecton or Feast for low-latency access.\nIn an industrial predictive maintenance use case, sensor data is ingested from devices into S3. A perfect pipeline processes the sensor data, joining it with maintenance records. The enriched dataset is stored in Feast so models can easily retrieve the latest data for training and predictions.\nVideo 13.2 below is a short overview of data pipelines.\n\n\n\n\n\n\nVideo 13.2: Data Pipelines\n\n\n\n\n\n\n\n\n13.3.2 CI/CD Pipelines\nContinuous integration and continuous delivery (CI/CD) pipelines actively automate the progression of ML models from initial development into production deployment. Adapted for ML systems, CI/CD principles empower teams to rapidly and robustly deliver new models with minimized manual errors.\nCI/CD pipelines orchestrate key steps, including checking out new code changes, transforming data, training and registering new models, validation testing, containerization, deploying to environments like staging clusters, and promoting to production. Teams leverage popular CI/CD solutions like Jenkins, CircleCI and GitHub Actions to execute these MLOps pipelines, while Prefect, Metaflow and Kubeflow offer ML-focused options.\nFigure 13.1 illustrates a CI/CD pipeline specifically tailored for MLOps. The process starts with a dataset and feature repository (on the left), which feeds into a dataset ingestion stage. Post-ingestion, the data undergoes validation to ensure its quality before being transformed for training. Parallel to this, a retraining trigger can initiate the pipeline based on specified criteria. The data then passes through a model training/tuning phase within a data processing engine, followed by model evaluation and validation. Once validated, the model is registered and stored in a machine learning metadata and artifact repository. The final stage involves deploying the trained model back into the dataset and feature repository, thereby creating a cyclical process for continuous improvement and deployment of machine learning models.\n\n\n\n\n\n\nFigure 13.1: MLOps CI/CD diagram. Credit: HarvardX.\n\n\n\nFor example, when a data scientist checks improvements to an image classification model into a GitHub repository, this actively triggers a Jenkins CI/CD pipeline. The pipeline reruns data transformations and model training on the latest data, tracking experiments with MLflow. After automated validation testing, teams deploy the model container to a Kubernetes staging cluster for further QA. Once approved, Jenkins facilitates a phased rollout of the model to production with canary deployments to catch any issues. If anomalies are detected, the pipeline enables teams to roll back to the previous model version gracefully.\nCI/CD pipelines empower teams to iterate and deliver ML models rapidly by connecting the disparate steps from development to deployment under continuous automation. Integrating MLOps tools like MLflow enhances model packaging, versioning, and pipeline traceability. CI/CD is integral for progressing models beyond prototypes into sustainable business systems.\n\n\n13.3.3 Model Training\nIn the model training phase, data scientists actively experiment with different ML architectures and algorithms to create optimized models that extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible.\nModern ML frameworks like TensorFlow, PyTorch and Keras provide pre-built components that simplify designing neural networks and other model architectures. Data scientists leverage built-in modules for layers, activations, losses, etc., and high-level APIs like Keras to focus more on model architecture.\nMLOps enables teams to package model training code into reusable, tracked scripts and notebooks. As models are developed, capabilities like hyperparameter tuning, neural architecture search and automatic feature selection rapidly iterate to find the best-performing configurations.\nTeams use Git to version control training code and host it in repositories like GitHub to track changes over time. This allows seamless collaboration between data scientists.\nNotebooks like Jupyter create an excellent interactive model development environment. The notebooks contain data ingestion, preprocessing, model declaration, training loop, evaluation, and export code in one reproducible document.\nFinally, teams orchestrate model training as part of a CI/CD pipeline for automation. For instance, a Jenkins pipeline can trigger a Python script to load new training data, retrain a TensorFlow classifier, evaluate model metrics, and automatically register the model if performance thresholds are met.\nAn example workflow has a data scientist using a PyTorch notebook to develop a CNN model for image classification. The fastai library provides high-level APIs to simplify training CNNs on image datasets. The notebook trains the model on sample data, evaluates accuracy metrics, and tunes hyperparameters like learning rate and layers to optimize performance. This reproducible notebook is version-controlled and integrated into a retraining pipeline.\nAutomating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed to produce ML systems.\n\n\n13.3.4 Model Evaluation\nBefore deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing, and canary testing.\nTeams typically evaluate models against holdout test datasets that are not used during training. The test data originates from the same distribution as production data. Teams calculate metrics like accuracy, AUC, precision, recall, and F1 score.\nTeams also track the same metrics over time against test data samples. If evaluation data comes from live production streams, this catches data drifts that degrade model performance over time.\nHuman oversight for model release remains important. Data scientists review performance across key segments and slices. Error analysis helps identify model weaknesses to guide enhancement. Teams apply fairness and bias detection techniques.\nCanary testing releases a model to a small subset of users to evaluate real-world performance before wide deployment. Teams incrementally route traffic to the canary release while monitoring for issues.\nFor example, a retailer evaluates a personalized product recommendation model against historical test data, reviewing accuracy and diversity metrics. Teams also calculate metrics on live customer data over time, detecting decreased accuracy over the last 2 weeks. Before full rollout, the new model is released to 5% of web traffic to ensure no degradation.\nAutomating evaluation and canary releases reduces deployment risks. However, human review still needs to be more critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production.\n\n\n13.3.5 Model Deployment\nTeams need to properly package, test, and track ML models to reliably deploy them to production. MLOps introduces frameworks and procedures for actively versioning, deploying, monitoring, and updating models in sustainable ways.\nTeams containerize models using Docker, which bundles code, libraries, and dependencies into a standardized unit. Containers enable smooth portability across environments.\nFrameworks like TensorFlow Serving and BentoML help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring.\nTeams first deploy updated models to staging or QA environments for testing before full production rollout. Shadow or canary deployments route a sample of traffic to test model variants. Teams incrementally increase access to new models.\nTeams build robust rollback procedures in case issues emerge. Rollbacks revert to the last known good model version. Integration with CI/CD pipelines simplifies redeployment if needed.\nTeams carefully track model artifacts, such as scripts, weights, logs, and metrics, for each version with ML metadata tools like MLflow. This maintains lineage and auditability.\nFor example, a retailer containerizes a product recommendation model in TensorFlow Serving and deploys it to a Kubernetes staging cluster. After monitoring and approving performance on sample traffic, Kubernetes shifts 10% of production traffic to the new model. If no issues are detected after a few days, the new model takes over 100% of traffic. However, teams should keep the previous version accessible for rollback if needed.\nModel deployment processes enable teams to make ML systems resilient in production by accounting for all transition states.\n\n\n13.3.6 Infrastructure Management\nMLOps teams heavily leverage infrastructure as code (IaC) tools and robust cloud architectures to actively manage the resources needed for development, training, and deployment of ML systems.\nTeams use IaC tools like Terraform, CloudFormation and Ansible to programmatically define, provision and update infrastructure in a version controlled manner. For MLOps, teams widely use Terraform to spin up resources on AWS, GCP and Azure.\nFor model building and training, teams dynamically provision computing resources like GPU servers, container clusters, storage, and databases through Terraform as needed by data scientists. Code encapsulates and preserves infrastructure definitions.\nContainers and orchestrators like Docker and Kubernetes allow teams to package models and reliably deploy them across different environments. Containers can be predictably spun up or down automatically based on demand.\nBy leveraging cloud elasticity, teams scale resources up and down to meet spikes in workloads like hyperparameter tuning jobs or spikes in prediction requests. Auto-scaling enables optimized cost efficiency.\nInfrastructure spans on-prem, cloud, and edge devices. A robust technology stack provides flexibility and resilience. Monitoring tools allow teams to observe resource utilization.\nFor example, a Terraform config may deploy a GCP Kubernetes cluster to host trained TensorFlow models exposed as prediction microservices. The cluster scales up pods to handle increased traffic. CI/CD integration seamlessly rolls out new model containers.\nCarefully managing infrastructure through IaC and monitoring enables teams to prevent bottlenecks in operationalizing ML systems at scale.\n\n\n13.3.7 Monitoring\nMLOps teams actively maintain robust monitoring to sustain visibility into ML models deployed in production. Continuous monitoring provides insights into model and system performance so teams can rapidly detect and address issues to minimize disruption.\nTeams actively monitor key model aspects, including analyzing samples of live predictions to track metrics like accuracy and confusion matrix over time.\nWhen monitoring performance, teams must profile incoming data to check for model drift - a steady decline in model accuracy after production deployment. Model drift can occur in two ways: concept drift and data drift. Concept drift refers to a fundamental change observed in the relationship between the input data and the target outcomes. For instance, as the COVID-19 pandemic progressed, e-commerce and retail sites had to correct their model recommendations since purchase data was overwhelmingly skewed towards items like hand sanitizer. Data drift describes changes in the distribution of data over time. For example, image recognition algorithms used in self-driving cars must account for seasonality in observing their surroundings. Teams also track application performance metrics like latency and errors for model integrations.\nFrom an infrastructure perspective, teams monitor for capacity issues like high CPU, memory, and disk utilization and system outages. Tools like Prometheus, Grafana, and Elastic enable teams to actively collect, analyze, query, and visualize diverse monitoring metrics. Dashboards make dynamics highly visible.\nTeams configure alerting for key monitoring metrics like accuracy declines and system faults to enable proactively responding to events that threaten reliability. For example, drops in model accuracy trigger alerts for teams to investigate potential data drift and retrain models using updated, representative data samples.\nAfter deployment, comprehensive monitoring enables teams to maintain confidence in model and system health. It empowers teams to catch and resolve deviations preemptively through data-driven alerts and dashboards. Active monitoring is essential for maintaining highly available, trustworthy ML systems.\nWatch the video below to learn more about monitoring.\n\n\n\n\n\n\nVideo 13.3: Model Monitoring\n\n\n\n\n\n\n\n\n13.3.8 Governance\nMLOps teams actively establish proper governance practices as a critical component. Governance provides oversight into ML models to ensure they are trustworthy, ethical, and compliant. Without governance, significant risks exist of models behaving in dangerous or prohibited ways when deployed in applications and business processes.\nMLOps governance employs techniques to provide transparency into model predictions, performance, and behavior throughout the ML lifecycle. Explainability methods like SHAP and LIME help auditors understand why models make certain predictions by highlighting influential input features behind decisions. Bias detection analyzes model performance across different demographic groups defined by attributes like age, gender, and ethnicity to detect any systematic skews. Teams perform rigorous testing procedures on representative datasets to validate model performance before deployment.\nOnce in production, teams monitor concept drift to determine whether predictive relationships change over time in ways that degrade model accuracy. Teams also analyze production logs to uncover patterns in the types of errors models generate. Documentation about data provenance, development procedures, and evaluation metrics provides additional visibility.\nPlatforms like Watson OpenScale incorporate governance capabilities like bias monitoring and explainability directly into model building, testing, and production monitoring. The key focus areas of governance are transparency, fairness, and compliance. This minimizes the risks of models behaving incorrectly or dangerously when integrated into business processes. Embedding governance practices into MLOps workflows enables teams to ensure trustworthy AI.\n\n\n13.3.9 Communication & Collaboration\nMLOps actively breaks down silos and enables the free flow of information and insights between teams through all ML lifecycle stages. Tools like MLflow, Weights & Biases, and data contexts provide traceability and visibility to improve collaboration.\nTeams use MLflow to systematize tracking of model experiments, versions, and artifacts. Experiments can be programmatically logged from data science notebooks and training jobs. The model registry provides a central hub for teams to store production-ready models before deployment, with metadata like descriptions, metrics, tags, and lineage. Integrations with Github, GitLab facilitate code change triggers.\nWeights & Biases provides collaborative tools tailored to ML teams. Data scientists log experiments, visualize metrics like loss curves, and share experimentation insights with colleagues. Comparison dashboards highlight model differences. Teams discuss progress and next steps.\nEstablishing shared data contexts—glossaries, data dictionaries, and schema references—ensures alignment on data meaning and usage across roles. Documentation aids understanding for those without direct data access.\nFor example, a data scientist may use Weights & Biases to analyze an anomaly detection model experiment and share the evaluation results with other team members to discuss improvements. The final model can then be registered with MLflow before handing off for deployment.\nEnabling transparency, traceability, and communication via MLOps empowers teams to remove bottlenecks and accelerate the delivery of impactful ML systems.\nVideo 13.4 covers key challenges in model deployment, including concept drift, model drift, and software engineering issues.\n\n\n\n\n\n\nVideo 13.4: Deployment Challenges", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1397,7 +1397,7 @@ "section": "13.4 Hidden Technical Debt in ML Systems", "text": "13.4 Hidden Technical Debt in ML Systems\nTechnical debt is increasingly pressing for ML systems (see Figure 14.2). This metaphor, originally proposed in the 1990s, likens the long-term costs of quick software development to financial debt. Just as some financial debt powers beneficial growth, carefully managed technical debt enables rapid iteration. However, left unchecked, accumulating technical debt can outweigh any gains.\nFigure 13.2 illustrates the various components contributing to ML systems’ hidden technical debt. It shows the interconnected nature of configuration, data collection, and feature extraction, which is foundational to the ML codebase. The box sizes indicate the proportion of the entire system represented by each component. In industry ML systems, the code for the model algorithm makes up only a tiny fraction (see the small black box in the middle compared to all the other large boxes). The complexity of ML systems and the fast-paced nature of the industry make it very easy to accumulate technical debt.\n\n\n\n\n\n\nFigure 13.2: ML system components. Credit: Sambasivan et al. (2021a)\n\n\n\n\n13.4.1 Model Boundary Erosion\nUnlike traditional software, ML lacks clear boundaries between components, as seen in the diagram above. This erosion of abstraction creates entanglements that exacerbate technical debt in several ways:\n\n\n13.4.2 Entanglement\nTight coupling between ML model components makes isolating changes difficult. Modifying one part causes unpredictable ripple effects throughout the system. Changing anything changes everything (also known as CACE) is a phenomenon that applies to any tweak you make to your system. Potential mitigations include decomposing the problem when possible or closely monitoring for changes in behavior to contain their impact.\n\n\n13.4.3 Correction Cascades\nThe flowchart in Figure 13.3 depicts the concept of correction cascades in the ML workflow, from problem statement to model deployment. The arcs represent the potential iterative corrections needed at each workflow stage, with different colors corresponding to distinct issues such as interacting with physical world brittleness, inadequate application-domain expertise, conflicting reward systems, and poor cross-organizational documentation. The red arrows indicate the impact of cascades, which can lead to significant revisions in the model development process. In contrast, the dotted red line represents the drastic measure of abandoning the process to restart. This visual emphasizes the complex, interconnected nature of ML system development and the importance of addressing these issues early in the development cycle to mitigate their amplifying effects downstream.\n\n\n\n\n\n\nFigure 13.3: Correction cascades flowchart. Credit: Sambasivan et al. (2021a).\n\n\n———. 2021a. ““Everyone Wants to Do the Model Work, Not the Data Work”: Data Cascades in High-Stakes AI.” In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\nBuilding models sequentially creates risky dependencies where later models rely on earlier ones. For example, taking an existing model and fine-tuning it for a new use case seems efficient. However, this bakes in assumptions from the original model that may eventually need correction.\nSeveral factors inform the decision to build models sequentially or not:\n\nDataset size and rate of growth: With small, static datasets, fine-tuning existing models often makes sense. For large, growing datasets, training custom models from scratch allows more flexibility to account for new data.\nAvailable computing resources: Fine-tuning requires fewer resources than training large models from scratch. With limited resources, leveraging existing models may be the only feasible approach.\n\nWhile fine-tuning can be efficient, modifying foundational components later becomes extremely costly due to the cascading effects on subsequent models. Careful thought should be given to identifying where introducing fresh model architectures, even with large resource requirements, can avoid correction cascades down the line (see Figure 14.3). There are still scenarios where sequential model building makes sense, which entails weighing these tradeoffs around efficiency, flexibility, and technical debt.\nFigure 13.4 depicts the concept of correction cascades in the ML workflow, from problem statement to model deployment. The arcs represent the potential iterative corrections needed at each stage of the workflow, with different colors corresponding to distinct issues such as interacting with physical world brittleness, inadequate application-domain expertise, conflicting reward systems, and poor cross-organizational documentation. The red arrows indicate the impact of cascades, which can lead to significant revisions in the model development process. In contrast, the dotted red line represents the drastic measure of abandoning the process to restart. This visual emphasizes the complex, interconnected nature of ML system development and the importance of addressing these issues early in the development cycle to mitigate their amplifying effects downstream.\n\n\n\n\n\n\nFigure 13.4: Data cascades. Credit: Sambasivan et al. (2021b).\n\n\nSambasivan, Nithya, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021b. ““Everyone Wants to Do the Model Work, Not the Data Work”: Data Cascades in High-Stakes AI.” In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. CHI ’21. New York, NY, USA: ACM. https://doi.org/10.1145/3411764.3445518.\n\n\n\n\n13.4.4 Undeclared Consumers\nOnce ML model predictions are made available, many downstream systems may silently consume them as inputs for further processing. However, the original model was not designed to accommodate this broad reuse. Due to the inherent opacity of ML systems, it becomes impossible to fully analyze the impact of the model’s outputs as inputs elsewhere. Changes to the model can then have expensive and dangerous consequences by breaking undiscovered dependencies.\nUndeclared consumers can also enable hidden feedback loops if their outputs indirectly influence the original model’s training data. Mitigations include restricting access to predictions, defining strict service contracts, and monitoring for signs of un-modelled influences. Architecting ML systems to encapsulate and isolate their effects limits the risks of unanticipated propagation.\n\n\n13.4.5 Data Dependency Debt\nData dependency debt refers to unstable and underutilized data dependencies, which can have detrimental and hard-to-detect repercussions. While this is a key contributor to tech debt for traditional software, those systems can benefit from the use of widely available tools for static analysis by compilers and linkers to identify dependencies of these types. ML systems need similar tooling.\nOne mitigation for unstable data dependencies is to use versioning, which ensures the stability of inputs but comes with the cost of managing multiple sets of data and the potential for staleness. Another mitigation for underutilized data dependencies is to conduct exhaustive leave-one-feature-out evaluation.\n\n\n13.4.6 Analysis Debt from Feedback Loops\nUnlike traditional software, ML systems can change their behavior over time, making it difficult to analyze pre-deployment. This debt manifests in feedback loops, both direct and hidden.\nDirect feedback loops occur when a model influences its future inputs, such as by recommending products to users that, in turn, shape future training data. Hidden loops arise indirectly between models, such as two systems that interact via real-world environments. Gradual feedback loops are especially hard to detect. These loops lead to analysis debt—the inability to predict how a model will act fully after release. They undermine pre-deployment validation by enabling unmodeled self-influence.\nCareful monitoring and canary deployments help detect feedback. However, fundamental challenges remain in understanding complex model interactions. Architectural choices that reduce entanglement and coupling mitigate analysis debt’s compounding effect.\n\n\n13.4.7 Pipeline Jungles\nML workflows often need more standardized interfaces between components. This leads teams to incrementally “glue” together pipelines with custom code. What emerges are “pipeline jungles”—tangled preprocessing steps that are brittle and resist change. Avoiding modifications to these messy pipelines causes teams to experiment through alternate prototypes. Soon, multiple ways of doing everything proliferate. The need for abstractions and interfaces then impedes sharing, reuse, and efficiency.\nTechnical debt accumulates as one-off pipelines solidify into legacy constraints. Teams sink time into managing idiosyncratic code rather than maximizing model performance. Architectural principles like modularity and encapsulation are needed to establish clean interfaces. Shared abstractions enable interchangeable components, prevent lock-in, and promote best-practice diffusion across teams. Breaking free of pipeline jungles ultimately requires enforcing standards that prevent the accretion of abstraction debt. The benefits of interfaces and APIs that tame complexity outweigh the transitional costs.\n\n\n13.4.8 Configuration Debt\nML systems involve extensive configuration of hyperparameters, architectures, and other tuning parameters. However, the configuration is often an afterthought, needing more rigor and testing—ad hoc configurations increase, amplified by the many knobs available for tuning complex ML models.\nThis accumulation of technical debt has several consequences. Fragile and outdated configurations lead to hidden dependencies and bugs that cause production failures. Knowledge about optimal configurations is isolated rather than shared, leading to redundant work. Reproducing and comparing results becomes difficult when configurations lack documentation. Legacy constraints accumulate as teams fear changing poorly understood configurations.\nAddressing configuration debt requires establishing standards to document, test, validate, and centrally store configurations. Investing in more automated approaches, such as hyperparameter optimization and architecture search, reduces dependence on manual tuning. Better configuration hygiene makes iterative improvement more tractable by preventing complexity from compounding endlessly. The key is recognizing configuration as an integral part of the ML system lifecycle rather than an ad hoc afterthought.\n\n\n13.4.9 The Changing World\nML systems operate in dynamic real-world environments. Thresholds and decisions that are initially effective become outdated as the world evolves. However, legacy constraints make adapting systems to changing populations, usage patterns, and other shifting contextual factors difficult.\nThis debt manifests in two main ways. First, preset thresholds and heuristics require constant re-evaluation and tuning as their optimal values drift. Second, validating systems through static unit and integration tests fails when inputs and behaviors are moving targets.\nResponding to a changing world in real-time with legacy ML systems is challenging. Technical debt accumulates as assumptions decay. The lack of modular architecture and the ability to dynamically update components without side effects exacerbates these issues.\nMitigating this requires building in configurability, monitoring, and modular updatability. Online learning, where models continuously adapt and robust feedback loops to training pipelines, helps automatically tune to the world. However, anticipating and architecting for change is essential to prevent erosion of real-world performance over time.\n\n\n13.4.10 Navigating Technical Debt in Early Stages\nUnderstandably, technical debt accumulates naturally in the early stages of model development. When aiming to build MVP models quickly, teams often need more complete information on what components will reach scale or require modification. Some deferred work is expected.\nHowever, even scrappy initial systems should follow principles like “Flexible Foundations” to avoid painting themselves into corners:\n\nModular code and reusable libraries allow components to be swapped later\nLoose coupling between models, data stores, and business logic facilitates change\nAbstraction layers hide implementation details that may shift over time\nContainerized model serving keeps options open on deployment requirements\n\nDecisions that seem reasonable at the moment can seriously limit future flexibility. For example, baking key business logic into model code rather than keeping it separate makes subsequent model changes extremely difficult.\nWith thoughtful design, though, it is possible to build quickly at first while retaining degrees of freedom to improve. As the system matures, prudent break points emerge where introducing fresh architectures proactively avoids massive rework down the line. This balances urgent timelines with reducing future correction cascades.\n\n\n13.4.11 Summary\nAlthough financial debt is a good metaphor for understanding tradeoffs, it differs from technical debt’s measurability. Technical debt needs to be fully tracked and quantified. This makes it hard for teams to navigate the tradeoffs between moving quickly and inherently introducing more debt versus taking the time to pay down that debt.\nThe Hidden Technical Debt of Machine Learning Systems paper spreads awareness of the nuances of ML system-specific tech debt. It encourages additional development in the broad area of maintainable ML.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1408,7 +1408,7 @@ "section": "13.5 Roles and Responsibilities", "text": "13.5 Roles and Responsibilities\nGiven the vastness of MLOps, successfully implementing ML systems requires diverse skills and close collaboration between people with different areas of expertise. While data scientists build the core ML models, it takes cross-functional teamwork to successfully deploy these models into production environments and enable them to deliver sustainable business value.\nMLOps provides the framework and practices for coordinating the efforts of various roles involved in developing, deploying, and running MLG systems. Bridging traditional silos between data, engineering, and operations teams is key to MLOp’s success. Enabling seamless collaboration through the machine learning lifecycle accelerates benefit realization while ensuring ML models’ long-term reliability and performance.\nWe will look at some key roles involved in MLOps and their primary responsibilities. Understanding the breadth of skills needed to operationalize ML models guides assembling MLOps teams. It also clarifies how the workflows between roles fit under the overarching MLOps methodology.\n\n13.5.1 Data Engineers\nData engineers are responsible for building and maintaining the data infrastructure and pipelines that feed data to ML models. They ensure data is smoothly moved from source systems into the storage, processing, and feature engineering environments needed for ML model development and deployment. Their main responsibilities include:\n\nMigrating raw data from on-prem databases, sensors, and apps into cloud-based data lakes like Amazon S3 or Google Cloud Storage. This provides cost-efficient, scalable storage.\nBuilding data pipelines with workflow schedulers like Apache Airflow, Prefect, and dbt. These extract data from sources, transform and validate data, and load it into destinations like data warehouses, feature stores, or directly for model training.\nTransforming messy, raw data into structured, analysis-ready datasets. This includes handling null or malformed values, deduplicating, joining disparate data sources, aggregating data, and engineering new features.\nMaintaining data infrastructure components like cloud data warehouses (Snowflake, Redshift, BigQuery), data lakes, and metadata management systems. Provisioning and optimizing data processing systems.\nEstablishing data versioning, backup, and archival processes for ML datasets and features and enforcing data governance policies.\n\nFor example, a manufacturing firm may use Apache Airflow pipelines to extract sensor data from PLCs on the factory floor into an Amazon S3 data lake. The data engineers would then process this raw data to filter, clean, and join it with product metadata. These pipeline outputs would then load into a Snowflake data warehouse from which features can be read for model training and prediction.\nThe data engineering team builds and sustains the data foundation for reliable model development and operations. Their work enables data scientists and ML engineers to focus on building, training, and deploying ML models at scale.\n\n\n13.5.2 Data Scientists\nThe job of the data scientists is to focus on the research, experimentation, development, and continuous improvement of ML models. They leverage their expertise in statistics, modeling, and algorithms to create high-performing models. Their main responsibilities include:\n\nWorking with business and data teams to identify opportunities where ML can add value, framing the problem, and defining success metrics.\nPerforming exploratory data analysis to understand relationships in data, derive insights, and identify relevant features for modeling.\nResearching and experimenting with different ML algorithms and model architectures based on the problem and data characteristics and leveraging libraries like TensorFlow, PyTorch, and Keras.\nTo maximize performance, train and fine-tune models by tuning hyperparameters, adjusting neural network architectures, feature engineering, etc.\nEvaluating model performance through metrics like accuracy, AUC, and F1 scores and performing error analysis to identify areas for improvement.\nDeveloping new model versions by incorporating new data, testing different approaches, optimizing model behavior, and maintaining documentation and lineage for models.\n\nFor example, a data scientist may leverage TensorFlow and TensorFlow Probability to develop a demand forecasting model for retail inventory planning. They would iterate on different sequence models like LSTMs and experiment with features derived from product, sales, and seasonal data. The model would be evaluated based on error metrics versus actual demand before deployment. The data scientist monitors performance and retrains/enhances the model as new data comes in.\nData scientists drive model creation, improvement, and innovation through their expertise in ML techniques. They collaborate closely with other roles to ensure models create maximum business impact.\n\n\n13.5.3 ML Engineers\nML engineers enable models data scientists develop to be productized and deployed at scale. Their expertise makes models reliably serve predictions in applications and business processes. Their main responsibilities include:\n\nTaking prototype models from data scientists and hardening them for production environments through coding best practices.\nBuilding APIs and microservices for model deployment using tools like Flask, FastAPI. Containerizing models with Docker.\nManage model versions, sync new models into production using CI/CD pipelines, and implement canary releases, A/B tests, and rollback procedures.\nOptimizing model performance for high scalability, low latency, and cost efficiency. Leveraging compression, quantization, and multi-model serving.\nMonitor models once in production and ensure continued reliability and accuracy. Retraining models periodically.\n\nFor example, an ML engineer may take a TensorFlow fraud detection model developed by data scientists and containerize it using TensorFlow Serving for scalable deployment. The model would be integrated into the company’s transaction processing pipeline via APIs. The ML engineer implements a model registry and CI/CD pipeline using MLFlow and Jenkins to deploy model updates reliably. The ML engineers then monitor the running model for continued performance using tools like Prometheus and Grafana. If model accuracy drops, they initiate retraining and deployment of a new model version.\nThe ML engineering team enables data science models to progress smoothly into sustainable and robust production systems. Their expertise in building modular, monitored systems delivers continuous business value.\n\n\n13.5.4 DevOps Engineers\nDevOps engineers enable MLOps by building and managing the underlying infrastructure for developing, deploying, and monitoring ML models. They provide the cloud architecture and automation pipelines. Their main responsibilities include:\n\nProvisioning and managing cloud infrastructure for ML workflows using IaC tools like Terraform, Docker, and Kubernetes.\nDeveloping CI/CD pipelines for model retraining, validation, and deployment. Integrating ML tools into the pipeline, such as MLflow and Kubeflow.\nMonitoring model and infrastructure performance using tools like Prometheus, Grafana, ELK stack. Building alerts and dashboards.\nImplement governance practices around model development, testing, and promotion to enable reproducibility and traceability.\nEmbedding ML models within applications. They are exposing models via APIs and microservices for integration.\nOptimizing infrastructure performance and costs and leveraging autoscaling, spot instances, and availability across regions.\n\nFor example, a DevOps engineer provisions a Kubernetes cluster on AWS using Terraform to run ML training jobs and online deployment. They build a CI/CD pipeline in Jenkins, which triggers model retraining if new data is available. After automated testing, the model is registered with MLflow and deployed in the Kubernetes cluster. The engineer then monitors cluster health, container resource usage, and API latency using Prometheus and Grafana.\nThe DevOps team enables rapid experimentation and reliable deployments for ML through cloud, automation, and monitoring expertise. Their work maximizes model impact while minimizing technical debt.\n\n\n13.5.5 Project Managers\nProject managers play a vital role in MLOps by coordinating the activities between the teams involved in delivering ML projects. They help drive alignment, accountability, and accelerated results. Their main responsibilities include:\n\nWorking with stakeholders to define project goals, success metrics, timelines, and budgets; outlining specifications and scope.\nCreating a project plan spanning data acquisition, model development, infrastructure setup, deployment, and monitoring.\nCoordinating design, development, and testing efforts between data engineers, data scientists, ML engineers, and DevOps roles.\nTracking progress and milestones, identifying roadblocks and resolving them through corrective actions, and managing risks and issues.\nFacilitating communication through status reports, meetings, workshops, and documentation and enabling seamless collaboration.\nDriving adherence to timelines and budget and escalating anticipated overruns or shortfalls for mitigation.\n\nFor example, a project manager would create a project plan for developing and enhancing a customer churn prediction model. They coordinate between data engineers building data pipelines, data scientists experimenting with models, ML engineers productionalizing models, and DevOps setting up deployment infrastructure. The project manager tracks progress via milestones like dataset preparation, model prototyping, deployment, and monitoring. To enact preventive solutions, they surface any risks, delays, or budget issues.\nSkilled project managers enable MLOps teams to work synergistically to rapidly deliver maximum business value from ML investments. Their leadership and organization align with diverse teams.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1419,7 +1419,7 @@ "section": "13.6 Embedded System Challenges", "text": "13.6 Embedded System Challenges\nWe will briefly review the challenges with embedded systems so that it sets the context for the specific challenges that emerge with embedded MLOps, which we will discuss in the following section.\n\n13.6.1 Limited Compute Resources\nEmbedded devices like microcontrollers and mobile phones have much more constrained computing power than data center machines or GPUs. A typical microcontroller may have only KB of RAM, MHz CPU speed, and no GPU. For example, a microcontroller in a smartwatch may only have a 32-bit processor running at 120MHz with 320KB of RAM (Stm32L4Q5Ag 2021). This allows simple ML models like small linear regressions or random forests, but more complex deep neural networks would be infeasible. Strategies to mitigate this include quantization, pruning, efficient model architectures, and offloading certain computations to the cloud when connectivity allows.\n\nStm32L4Q5Ag. 2021. STMicroelectronics.\n\n\n13.6.2 Constrained Memory\nStoring large ML models and datasets directly on embedded devices is often infeasible with limited memory. For example, a deep neural network model can easily take hundreds of MB, which exceeds the storage capacity of many embedded systems. Consider this example. A wildlife camera that captures images to detect animals may have only a 2GB memory card. More is needed to store a deep learning model for image classification that is often hundreds of MB in size. Consequently, this requires optimization of memory usage through weights compression, lower-precision numerics, and streaming inference pipelines.\n\n\n13.6.3 Intermittent Connectivity\nMany embedded devices operate in remote environments without reliable internet connectivity. We must rely on something other than constant cloud access for convenient retraining, monitoring, and deployment. Instead, we need smart scheduling and caching strategies to optimize for intermittent connections. For example, a model predicting crop yield on a remote farm may need to make predictions daily but only have connectivity to the cloud once a week when the farmer drives into town. The model needs to operate independently in between connections.\n\n\n13.6.4 Power Limitations\nEmbedded devices like phones, wearables, and remote sensors are battery-powered. Continual inference and communication can quickly drain those batteries, limiting functionality. For example, a smart collar tagging endangered animals runs on a small battery. Continuously running a GPS tracking model would drain the battery within days. The collar has to schedule when to activate the model carefully. Thus, embedded ML has to manage tasks carefully to conserve power. Techniques include optimized hardware accelerators, prediction caching, and adaptive model execution.\n\n\n13.6.5 Fleet Management\nFor mass-produced embedded devices, millions of units can be deployed in the field to orchestrate updates. Hypothetically, updating a fraud detection model on 100 million (future smart) credit cards requires securely pushing updates to each distributed device rather than a centralized data center. Such a distributed scale makes fleet-wide management much harder than a centralized server cluster. It requires intelligent protocols for over-the-air updates, handling connectivity issues, and monitoring resource constraints across devices.\n\n\n13.6.6 On-Device Data Collection\nCollecting useful training data requires engineering both the sensors on the device and the software pipelines. This is unlike servers, where we can pull data from external sources. Challenges include handling sensor noise. Sensors on an industrial machine detect vibrations and temperature to predict maintenance needs. This requires tuning the sensors and sampling rates to capture useful data.\n\n\n13.6.7 Device-Specific Personalization\nA smart speaker learns an individual user’s voice patterns and speech cadence to improve recognition accuracy while protecting privacy. Adapting ML models to specific devices and users is important, but this poses privacy challenges. On-device learning allows personalization without transmitting as much private data. However, balancing model improvement, privacy preservation, and constraints requires novel techniques.\n\n\n13.6.8 Safety Considerations\nIf extremely large embedded ML in systems like self-driving vehicles is not engineered carefully, there are serious safety risks. To ensure safe operation before deployment, self-driving cars must undergo extensive track testing in simulated rain, snow, and obstacle scenarios. This requires extensive validation, fail-safes, simulators, and standards compliance before deployment.\n\n\n13.6.9 Diverse Hardware Targets\nThere is a diverse range of embedded processors, including ARM, x86, specialized AI accelerators, FPGAs, etc. Supporting this heterogeneity makes deployment challenging. We need strategies like standardized frameworks, extensive testing, and model tuning for each platform. For example, an object detection model needs efficient implementations across embedded devices like a Raspberry Pi, Nvidia Jetson, and Google Edge TPU.\n\n\n13.6.10 Testing Coverage\nRigorously testing edge cases is difficult with constrained embedded simulation resources, but exhaustive testing is critical in systems like self-driving cars. Exhaustively testing an autopilot model requires millions of simulated kilometers, exposing it to rare events like sensor failures. Therefore, strategies like synthetic data generation, distributed simulation, and chaos engineering help improve coverage.\n\n\n13.6.11 Concept Drift Detection\nWith limited monitoring data from each remote device, detecting changes in the input data over time is much harder. Drift can lead to degraded model performance. Lightweight methods are needed to identify when retraining is necessary. A model predicting power grid loads shows declining performance as usage patterns change over time. With only local device data, this trend is difficult to spot.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1430,7 +1430,7 @@ "section": "13.7 Traditional MLOps vs. Embedded MLOps", "text": "13.7 Traditional MLOps vs. Embedded MLOps\nIn traditional MLOps, ML models are typically deployed in cloud-based or server environments, with abundant resources like computing power and memory. These environments facilitate the smooth operation of complex models that require significant computational resources. For instance, a cloud-based image recognition model might be used by a social media platform to tag photos with relevant labels automatically. In this case, the model can leverage the extensive resources available in the cloud to efficiently process vast amounts of data.\nOn the other hand, embedded MLOps involves deploying ML models on embedded systems, specialized computing systems designed to perform specific functions within larger systems. Embedded systems are typically characterized by their limited computational resources and power. For example, an ML model might be embedded in a smart thermostat to optimize heating and cooling based on the user’s preferences and habits. The model must be optimized to run efficiently on the thermostat’s limited hardware without compromising its performance or accuracy.\nThe key difference between traditional and embedded MLOps lies in the embedded system’s resource constraints. While traditional MLOps can leverage abundant cloud or server resources, embedded MLOps must contend with the hardware limitations on which the model is deployed. This requires careful optimization and fine-tuning of the model to ensure it can deliver accurate and valuable insights within the embedded system’s constraints.\nFurthermore, embedded MLOps must consider the unique challenges posed by integrating ML models with other embedded system components. For example, the model must be compatible with the system’s software and hardware and must be able to interface seamlessly with other components, such as sensors or actuators. This requires a deep understanding of both ML and embedded systems and close collaboration between data scientists, engineers, and other stakeholders.\nSo, while traditional MLOps and embedded MLOps share the common goal of deploying and maintaining ML models in production environments, the unique challenges posed by embedded systems require a specialized approach. Embedded MLOps must carefully balance the need for model accuracy and performance with the constraints of the hardware on which the model is deployed. This requires a deep understanding of both ML and embedded systems and close collaboration between various stakeholders to ensure the successful integration of ML models into embedded systems.\nThis time, we will group the subtopics under broader categories to streamline the structure of our thought process on MLOps. This structure will help you understand how different aspects of MLOps are interconnected and why each is important for the efficient operation of ML systems as we discuss the challenges in the context of embedded systems.\n\nModel Lifecycle Management\n\nData Management: Handling data ingestion, validation, and version control.\nModel Training: Techniques and practices for effective and scalable model training.\nModel Evaluation: Strategies for testing and validating model performance.\nModel Deployment: Approaches for deploying models into production environments.\n\nDevelopment and Operations Integration\n\nCI/CD Pipelines: Integrating ML models into continuous integration and deployment pipelines.\nInfrastructure Management: Setting up and maintaining the infrastructure required for training and deploying models.\nCommunication & Collaboration: Ensuring smooth communication and collaboration between data scientists, ML engineers, and operations teams.\n\nOperational Excellence\n\nMonitoring: Techniques for monitoring model performance, data drift, and operational health.\nGovernance: Implementing policies for model auditability, compliance, and ethical considerations.\n\n\n\n13.7.1 Model Lifecycle Management\n\nData Management\nIn traditional centralized MLOps, data is aggregated into large datasets and data lakes, then processed on cloud or on-prem servers. However, embedded MLOps relies on decentralized data from local on-device sensors. Devices collect smaller batches of incremental data, often noisy and unstructured. With connectivity constraints, this data cannot always be instantly transmitted to the cloud and needs to be intelligently cached and processed at the edge.\nDue to limited on-device computing, embedded devices can only preprocess and clean data minimally before transmission. Early filtering and processing occur at edge gateways to reduce transmission loads. While leveraging cloud storage, more processing and storage happen at the edge to account for intermittent connectivity. Devices identify and transmit only the most critical subsets of data to the cloud.\nLabeling also needs centralized data access, requiring more automated techniques like federated learning, where devices collaboratively label peers’ data. With personal edge devices, data privacy and regulations are critical concerns. Data collection, transmission, and storage must be secure and compliant.\nFor instance, a smartwatch may collect the day’s step count, heart rate, and GPS coordinates. This data is cached locally and transmitted to an edge gateway when WiFi is available—the gateway processes and filters data before syncing relevant subsets with the cloud platform to retrain models.\n\n\nModel Training\nIn traditional centralized MLOps, models are trained using abundant data via deep learning on high-powered cloud GPU servers. However, embedded MLOps need more support in model complexity, data availability, and computing resources for training.\nThe volume of aggregated data is much lower, often requiring techniques like federated learning across devices to create training sets. The specialized nature of edge data also limits public datasets for pre-training. With privacy concerns, data samples must be tightly controlled and anonymized where possible.\nFurthermore, the models must use simplified architectures optimized for low-power edge hardware. Given the computing limitations, high-end GPUs are inaccessible for intensive deep learning. Training leverages lower-powered edge servers and clusters with distributed approaches to spread load.\nStrategies like transfer learning become essential to mitigate data scarcity and irregularity (see Figure 14.5). Models can pre-train on large public datasets and then finetune the training on limited domain-specific edge data. Even incremental on-device learning to customize models helps overcome the decentralized nature of embedded data. The lack of broad labeled data also motivates semi-supervised techniques.\nFigure 13.5 illustrates the concept of transfer learning in model training within an MLOps framework. It showcases a neural network where the initial layers (W_{A1} to W_{A4}), which are responsible for general feature extraction, are frozen (indicated by the green dashed line), meaning their weights are not updated during training. This reuse of pre-trained layers accelerates learning by utilizing knowledge gained from previous tasks. The latter layers (W_{A5} to W_{A7}), depicted beyond the blue dashed line, are finetuned for the specific task at hand, focusing on task-specific feature learning. This approach allows the model to adapt to the new task using fewer resources and potentially achieve higher performance on specialized tasks by reusing the general features learned from a broader dataset.\n\n\n\n\n\n\nFigure 13.5: Transfer learning in MLOps. Credit: HarvardX.\n\n\n\nFor example, a smart home assistant may pre-train an audio recognition model on public YouTube clips, which helps bootstrap with general knowledge. It then transfers learning to a small sample of home data to classify customized appliances and events, specializing in the model. The model transforms into a lightweight neural network optimized for microphone-enabled devices across the home.\nSo, embedded MLOps face acute challenges in constructing training datasets, designing efficient models, and distributing compute for model development compared to traditional settings. Given the embedded constraints, careful adaptation, such as transfer learning and distributed training, is required to train models.\n\n\nModel Evaluation\nIn traditional centralized MLOps, models are evaluated primarily using accuracy metrics and holdout test datasets. However, embedded MLOps require a more holistic evaluation that accounts for system constraints beyond accuracy.\nModels must be tested early and often on deployed edge hardware covering diverse configurations. In addition to accuracy, factors like latency, CPU usage, memory footprint, and power consumption are critical evaluation criteria. Models are selected based on tradeoffs between these metrics to meet edge device constraints.\nData drift must also be monitored - where models trained on cloud data degrade in accuracy over time on local edge data. Embedded data often has more variability than centralized training sets. Evaluating models across diverse operational edge data samples is key. But sometimes, getting the data for monitoring the drift can be challenging if these devices are in the wild and communication is a barrier.\nOngoing monitoring provides visibility into real-world performance post-deployment, revealing bottlenecks not caught during testing. For instance, a smart camera model update may be canary tested on 100 cameras first and rolled back if degraded accuracy is observed before expanding to all 5000 cameras.\n\n\nModel Deployment\nIn traditional MLOps, new model versions are directly deployed onto servers via API endpoints. However, embedded devices require optimized delivery mechanisms to receive updated models. Over-the-air (OTA) updates provide a standardized approach to wirelessly distributing new software or firmware releases to embedded devices. Rather than direct API access, OTA packages allow remote deploying models and dependencies as pre-built bundles. Alternatively, federated learning allows model updates without direct access to raw training data. This decentralized approach has the potential for continuous model improvement but needs robust MLOps platforms.\nModel delivery relies on physical interfaces like USB or UART serial connections for deeply embedded devices lacking connectivity. The model packaging still follows similar principles to OTA updates, but the deployment mechanism is tailored to the capabilities of the edge hardware. Moreover, specialized OTA protocols optimized for IoT networks are often used rather than standard WiFi or Bluetooth protocols. Key factors include efficiency, reliability, security, and telemetry, such as progress tracking—solutions like Mender. Io provides embedded-focused OTA services handling differential updates across device fleets.\nFigure 13.6 presents an overview of Model Lifecycle Management in an MLOps context, illustrating the flow from development (top left) to deployment and monitoring (bottom right). The process begins with ML Development, where code and configurations are version-controlled. Data and model management are central to the process, involving datasets and feature repositories. Continuous training, model conversion, and model registry are key stages in the operationalization of training. The model deployment includes serving the model and managing serving logs. Alerting mechanisms are in place to flag issues, which feed into continuous monitoring to ensure model performance and reliability over time. This integrated approach ensures that models are developed and maintained effectively throughout their lifecycle.\n\n\n\n\n\n\nFigure 13.6: Model lifecycle management. Credit: HarvardX.\n\n\n\n\n\n\n13.7.2 Development and Operations Integration\n\nCI/CD Pipelines\nIn traditional MLOps, robust CI/CD infrastructure like Jenkins and Kubernetes enables pipeline automation for large-scale model deployment. However, embedded MLOps need this centralized infrastructure and more tailored CI/CD workflows for edge devices.\nBuilding CI/CD pipelines has to account for a fragmented landscape of diverse hardware, firmware versions, and connectivity constraints. There is no standard platform to orchestrate pipelines, and tooling support is more limited.\nTesting must cover this wide spectrum of target embedded devices early, which is difficult without centralized access. Companies must invest significant effort into acquiring and managing test infrastructure across the heterogeneous embedded ecosystem.\nOver-the-air updates require setting up specialized servers to distribute model bundles securely to devices in the field. Rollout and rollback procedures must also be carefully tailored for particular device families.\nWith traditional CI/CD tools less applicable, embedded MLOps rely more on custom scripts and integration. Companies take varied approaches, from open-source frameworks to fully in-house solutions. Tight integration between developers, edge engineers, and end customers establishes trusted release processes.\nTherefore, embedded MLOps can’t leverage centralized cloud infrastructure for CI/CD. Companies combine custom pipelines, testing infrastructure, and OTA delivery to deploy models across fragmented and disconnected edge systems.\n\n\nInfrastructure Management\nIn traditional centralized MLOps, infrastructure entails provisioning cloud servers, GPUs, and high-bandwidth networks for intensive workloads like model training and serving predictions at scale. However, embedded MLOps require more heterogeneous infrastructure spanning edge devices, gateways, and the cloud.\nEdge devices like sensors capture and preprocess data locally before intermittent transmission to avoid overloading networks—gateways aggregate and process device data before sending select subsets to the cloud for training and analysis. The cloud provides centralized management and supplemental computing.\nThis infrastructure needs tight integration and balancing processing and communication loads. Network bandwidth is limited, requiring careful data filtering and compression. Edge computing capabilities are modest compared to the cloud, imposing optimization constraints.\nManaging secure OTA updates across large device fleets presents challenges at the edge. Rollouts must be incremental and rollback-ready for quick mitigation. Given decentralized environments, updating edge infrastructure requires coordination.\nFor example, an industrial plant may perform basic signal processing on sensors before sending data to an on-prem gateway. The gateway handles data aggregation, infrastructure monitoring, and OTA updates. Only curated data is transmitted to the cloud for advanced analytics and model retraining.\nEmbedded MLOps requires holistic management of distributed infrastructure spanning constrained edge, gateways, and centralized cloud. Workloads are balanced across tiers while accounting for connectivity, computing, and security challenges.\n\n\nCommunication & Collaboration\nIn traditional MLOps, collaboration tends to center around data scientists, ML engineers, and DevOps teams. However, embedded MLOps require tighter cross-functional coordination between additional roles to address system constraints.\nEdge engineers optimize model architectures for target hardware environments. They provide feedback to data scientists during development so models fit device capabilities early on. Similarly, product teams define operational requirements informed by end-user contexts.\nWith more stakeholders across the embedded ecosystem, communication channels must facilitate information sharing between centralized and remote teams. Issue tracking and project management ensure alignment.\nCollaborative tools optimize models for particular devices. Data scientists can log issues replicated from field devices so models specialize in niche data. Remote device access aids debugging and data collection.\nFor example, data scientists may collaborate with field teams managing fleets of wind turbines to retrieve operational data samples. This data is used to specialize models detecting anomalies specific to that turbine class. Model updates are tested in simulations and reviewed by engineers before field deployment.\nEmbedded MLOps mandates continuous coordination between data scientists, engineers, end customers, and other stakeholders throughout the ML lifecycle. Through close collaboration, models can be tailored and optimized for targeted edge devices.\n\n\n\n13.7.3 Operational Excellence\n\nMonitoring\nTraditional MLOps monitoring focuses on centrally tracking model accuracy, performance metrics, and data drift. However, embedded MLOps must account for decentralized monitoring across diverse edge devices and environments.\nEdge devices require optimized data collection to transmit key monitoring metrics without overloading networks. Metrics help assess model performance, data patterns, resource usage, and other behaviors on remote devices.\nWith limited connectivity, more analysis occurs at the edge before aggregating insights centrally. Gateways play a key role in monitoring fleet health and coordinating software updates. Confirmed indicators are eventually propagated to the cloud.\nBroad device coverage is challenging but critical. Issues specific to certain device types may arise, so monitoring needs to cover the full spectrum. Canary deployments help trial monitoring processes before scaling.\nAnomaly detection identifies incidents requiring rolling back models or retraining on new data. However, interpreting alerts requires understanding unique device contexts based on input from engineers and customers.\nFor example, an automaker may monitor autonomous vehicles for indicators of model degradation using caching, aggregation, and real-time streams. Engineers assess when identified anomalies warrant OTA updates to improve models based on factors like location and vehicle age.\nEmbedded MLOps monitoring provides observability into model and system performance across decentralized edge environments. Careful data collection, analysis, and collaboration deliver meaningful insights to maintain reliability.\n\n\nGovernance\nIn traditional MLOps, governance focuses on model explainability, fairness, and compliance for centralized systems. However, embedded MLOps must also address device-level governance challenges related to data privacy, security, and safety.\nWith sensors collecting personal and sensitive data, local data governance on devices is critical. Data access controls, anonymization, and encrypted caching help address privacy risks and compliance like HIPAA and GDPR. Updates must maintain security patches and settings.\nSafety governance considers the physical impacts of flawed device behavior. Failures could cause unsafe conditions in vehicles, factories, and critical systems. Redundancy, fail-safes, and warning systems help mitigate risks.\nTraditional governance, such as bias monitoring and model explainability, remains imperative but is harder to implement for embedded AI. Peeking into black-box models on low-power devices also poses challenges.\nFor example, a medical device may scrub personal data on the device before transmission. Strict data governance protocols approve model updates. Model explainability is limited, but the focus is on detecting anomalous behavior. Backup systems prevent failures.\nEmbedded MLOps governance must encompass privacy, security, safety, transparency, and ethics. Specialized techniques and team collaboration are needed to help establish trust and accountability within decentralized environments.\n\n\n\n13.7.4 Comparison\nHere is a comparison table highlighting similarities and differences between Traditional MLOps and Embedded MLOps based on all the things we have learned thus far:\n\n\n\n\n\n\n\n\nArea\nTraditional MLOps\nEmbedded MLOps\n\n\n\n\nData Management\nLarge datasets, data lakes, feature stores\nOn-device data capture, edge caching and processing\n\n\nModel Development\nLeverage deep learning, complex neural nets, GPU training\nConstraints on model complexity, need for optimization\n\n\nDeployment\nServer clusters, cloud deployment, low latency at scale\nOTA deployment to devices, intermittent connectivity\n\n\nMonitoring\nDashboards, logs, alerts for cloud model performance\nOn-device monitoring of predictions, resource usage\n\n\nRetraining\nRetrain models on new data\nFederated learning from devices, edge retraining\n\n\nInfrastructure\nDynamic cloud infrastructure\nHeterogeneous edge/cloud infrastructure\n\n\nCollaboration\nShared experiment tracking and model registry\nCollaboration for device-specific optimization\n\n\n\nSo, while Embedded MLOps shares foundational MLOps principles, it faces unique constraints in tailoring workflows and infrastructure specifically for resource-constrained edge devices.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1441,7 +1441,7 @@ "section": "13.8 Commercial Offerings", "text": "13.8 Commercial Offerings\nWhile understanding the principles is not a substitute for understanding them, an increasing number of commercial offerings help ease the burden of building ML pipelines and integrating tools to build, test, deploy, and monitor ML models in production.\n\n13.8.1 Traditional MLOps\nGoogle, Microsoft, and Amazon offer their version of managed ML services. These include services that manage model training and experimentation, model hosting and scaling, and monitoring. These offerings are available via an API and client SDKs, as well as through web UIs. While it is possible to build your own end-to-end MLOps solutions using pieces from each, the greatest ease of use benefits come by staying within a single provider ecosystem to take advantage of interservice integrations.\nI will provide a quick overview of the services that fit into each part of the MLOps life cycle described above, providing examples of offerings from different providers. The space is moving very quickly; new companies and products are entering the scene very rapidly, and these are not meant to serve as an endorsement of a particular company’s offering.\n\nData Management\nData storage and versioning are table stakes for any commercial offering, and most take advantage of existing general-purpose storage solutions such as S3. Others use more specialized options such as git-based storage (Example: Hugging Face’s Dataset Hub This is an area where providers make it easy to support their competitors’ data storage options, as they don’t want this to be a barrier for adoptions of the rest of their MLOps services. For example, Vertex AI’s training pipeline seamlessly supports datasets stored in S3, Google Cloud Buckets, or Hugging Face’s Dataset Hub.\n\n\nModel Training\nManaged training services are where cloud providers shine, as they provide on-demand access to hardware that is out of reach for most smaller companies. They bill only for hardware during training time, putting GPU-accelerated training within reach of even the smallest developer teams. The control developers have over their training workflow can vary widely depending on their needs. Some providers have services that provide little more than access to the resources and rely on the developer to manage the training loop, logging, and model storage themselves. Other services are as simple as pointing to a base model and a labeled data set to kick off a fully managed finetuning job (example: Vertex AI Fine Tuning).\nA word of warning: As of 2023, GPU hardware demand well exceeds supply, and as a result, cloud providers are rationing access to their GPUs. In some data center regions, GPUs may be unavailable or require long-term contracts.\n\n\nModel Evaluation\nModel evaluation tasks typically involve monitoring models’ accuracy, latency, and resource usage in both the testing and production phases. Unlike embedded systems, ML models deployed to the cloud benefit from constant internet connectivity and unlimited logging capacities. As a result, it is often feasible to capture and log every request and response. This makes replaying or generating synthetic requests to compare different models and versions tractable.\nSome providers also offer services that automate the experiment tracking of modifying model hyperparameters. They track the runs and performance and generate artifacts from these model training runs. Example: WeightsAndBiases\n\n\nModel Deployment\nEach provider typically has a service referred to as a “model registry,” where training models are stored and accessed. Often, these registries may also provide access to base models that are either open source or provided by larger technology companies (or, in some cases, like LLAMA, both!). These model registries are a common place to compare all the models and their versions to allow easy decision-making on which to pick for a given use case. Example: Vertex AI’s model registry\nFrom the model registry, deploying a model to an inference endpoint is quick and simple, and it handles the resource provisioning, model weight downloading, and hosting of a given model. These services typically give access to the model via a REST API where inference requests can be sent. Depending on the model type, specific resources can be configured, such as which type of GPU accelerator may be needed to hit the desired performance. Some providers may also offer serverless inference or batch inference options that do not need a persistent endpoint to access the model. Example: AWS SageMaker Inference\n\n\n\n13.8.2 Embedded MLOps\nDespite the proliferation of new ML Ops tools in response to the increase in demand, the challenges described earlier have constrained the availability of such tools in embedded systems environments. More recently, new tools such as Edge Impulse (Janapa Reddi et al. 2023) have made the development process somewhat easier, as described below.\n\nJanapa Reddi, Vijay, Alexander Elium, Shawn Hymel, David Tischler, Daniel Situnayake, Carl Ward, Louis Moreau, et al. 2023. “Edge Impulse: An MLOps Platform for Tiny Machine Learning.” Proceedings of Machine Learning and Systems 5.\n\nEdge Impulse\nEdge Impulse is an end-to-end development platform for creating and deploying machine learning models onto edge devices such as microcontrollers and small processors. It aims to make embedded machine learning more accessible to software developers through its easy-to-use web interface and integrated tools for data collection, model development, optimization, and deployment. Its key capabilities include the following:\n\nIntuitive drag-and-drop workflow for building ML models without coding required\nTools for acquiring, labeling, visualizing, and preprocessing data from sensors\nChoice of model architectures, including neural networks and unsupervised learning\nModel optimization techniques to balance performance metrics and hardware constraints\n\nSeamless deployment onto edge devices through compilation, SDKs, and benchmarks\nCollaboration features for teams and integration with other platforms\n\nWith Edge Impulse, developers with limited data science expertise can develop specialized ML models that run efficiently within small computing environments. It provides a comprehensive solution for creating embedded intelligence and advancing machine learning.\n\nUser Interface\nEdge Impulse was designed with seven key principles: accessibility, end-to-end capabilities, a data-centric approach, interactiveness, extensibility, team orientation, and community support. The intuitive user interface, shown in Figure 13.7, guides developers at all experience levels through uploading data, selecting a model architecture, training the model, and deploying it across relevant hardware platforms. It should be noted that, like any tool, Edge Impulse is intended to assist with, not replace, foundational considerations such as determining if ML is an appropriate solution or acquiring the requisite domain expertise for a given application.\n\n\n\n\n\n\nFigure 13.7: Screenshot of Edge Impulse user interface for building workflows from input data to output features.\n\n\n\nWhat makes Edge Impulse notable is its comprehensive yet intuitive end-to-end workflow. Developers start by uploading their data through file upload or command line interface (CLI) tools, after which they can examine raw samples and visualize the data distribution in the training and test splits. Next, users can pick from various preprocessing “blocks” to facilitate digital signal processing (DSP). While default parameter values are provided, users can customize the parameters as needed, with considerations around memory and latency displayed. Users can easily choose their neural network architecture - without any code needed.\nThanks to the platform’s visual editor, users can customize the architecture’s components and specific parameters while ensuring that the model is still trainable. Users can also leverage unsupervised learning algorithms, such as K-means clustering and Gaussian mixture models (GMM).\n\n\nOptimizations\nTo accommodate the resource constraints of TinyML applications, Edge Impulse provides a confusion matrix summarizing key performance metrics, including per-class accuracy and F1 scores. The platform elucidates the tradeoffs between model performance, size, and latency using simulations in Renode and device-specific benchmarking. For streaming data use cases, a performance calibration tool leverages a genetic algorithm to find ideal post-processing configurations balancing false acceptance and false rejection rates. Techniques like quantization, code optimization, and device-specific optimization are available to optimize models. For deployment, models can be compiled in appropriate formats for target edge devices. Native firmware SDKs also enable direct data collection on devices.\nIn addition to streamlining development, Edge Impulse scales the modeling process itself. A key capability is the EON Tuner, an automated machine learning (AutoML) tool that assists users in hyperparameter tuning based on system constraints. It runs a random search to generate configurations for digital signal processing and training steps quickly. The resulting models are displayed for the user to select based on relevant performance, memory, and latency metrics. For data, active learning facilitates training on a small labeled subset, followed by manually or automatically labeling new samples based on proximity to existing classes. This expands data efficiency.\n\n\nUse Cases\nBeyond the accessibility of the platform itself, the Edge Impulse team has expanded the knowledge base of the embedded ML ecosystem. The platform lends itself to academic environments, having been used in online courses and on-site workshops globally. Numerous case studies featuring industry and research use cases have been published, most notably Oura Ring, which uses ML to identify sleep patterns. The team has made repositories open source on GitHub, facilitating community growth. Users can also make projects public to share techniques and download libraries to share via Apache. Organization-level access enables collaboration on workflows.\nOverall, Edge Impulse is uniquely comprehensive and integrateable for developer workflows. Larger platforms like Google and Microsoft focus more on cloud versus embedded systems. TinyMLOps frameworks such as Neuton AI and Latent AI offer some functionality but lack Edge Impulse’s end-to-end capabilities. TensorFlow Lite Micro is the standard inference engine due to flexibility, open source status, and TensorFlow integration, but it uses more memory and storage than Edge Impulse’s EON Compiler. Other platforms need to be updated, academic-focused, or more versatile. In summary, Edge Impulse aims to streamline and scale embedded ML through an accessible, automated platform.\n\n\n\nLimitations\nWhile Edge Impulse provides an accessible pipeline for embedded ML, important limitations and risks remain. A key challenge is data quality and availability - the models are only as good as the data used to train them. Users must have sufficient labeled samples that capture the breadth of expected operating conditions and failure modes. Labeled anomalies and outliers are critical yet time-consuming to collect and identify. Insufficient or biased data leads to poor model performance regardless of the tool’s capabilities.\nDeploying low-powered devices also presents inherent challenges. Optimized models may still need to be more resource-intensive for ultra-low-power MCUs. Striking the right balance of compression versus accuracy takes some experimentation. The tool simplifies but still needs to eliminate the need for foundational ML and signal processing expertise. Embedded environments also constrain debugging and interpretability compared to the cloud.\nWhile impressive results are achievable, users shouldn’t view Edge Impulse as a “Push Button ML” solution. Careful project scoping, data collection, model evaluation, and testing are still essential. As with any development tool, reasonable expectations and diligence in application are advised. However, Edge Impulse can accelerate embedded ML prototyping and deployment for developers willing to invest the requisite data science and engineering effort.\n\n\n\n\n\n\nExercise 13.1: Edge Impulse\n\n\n\n\n\nReady to level up your tiny machine-learning projects? Let’s combine the power of Edge Impulse with the awesome visualizations of Weights & Biases (WandB). In this Colab, you’ll learn to track your model’s training progress like a pro! Imagine seeing cool graphs of your model getting smarter, comparing different versions, and ensuring your AI performs its best even on tiny devices.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1452,7 +1452,7 @@ "section": "13.9 Case Studies", "text": "13.9 Case Studies\n\n13.9.1 Oura Ring\nThe Oura Ring is a wearable that can measure activity, sleep, and recovery when placed on the user’s finger. Using sensors to track physiological metrics, the device uses embedded ML to predict the stages of sleep. To establish a baseline of legitimacy in the industry, Oura conducted a correlation experiment to evaluate the device’s success in predicting sleep stages against a baseline study. This resulted in a solid 62% correlation compared to the 82-83% baseline. Thus, the team set out to determine how to improve their performance even further.\nThe first challenge was to obtain better data in terms of both quantity and quality. They could host a larger study to get a more comprehensive data set, but the data would be so noisy and large that it would be difficult to aggregate, scrub, and analyze. This is where Edge Impulse comes in.\nWe hosted a massive sleep study of 100 men and women between the ages of 15 and 73 across three continents (Asia, Europe, and North America). In addition to wearing the Oura Ring, participants were responsible for undergoing the industry standard PSG testing, which provided a “label” for this data set. With 440 nights of sleep from 106 participants, the data set totaled 3,444 hours in length across Ring and PSG data. With Edge Impulse, Oura could easily upload and consolidate data from different sources into a private S3 bucket. They were also able to set up a Data Pipeline to merge data samples into individual files and preprocess the data without having to conduct manual scrubbing.\nBecause of the time saved on data processing thanks to Edge Impulse, the Oura team could focus on the key drivers of their prediction. They only extracted three types of sensor data: heart rate, motion, and body temperature. After partitioning the data using five-fold cross-validation and classifying sleep stages, the team achieved a correlation of 79% - just a few percentage points off the standard. They readily deployed two types of sleep detection models: one simplified using just the ring’s accelerometer and one more comprehensive leveraging Autonomic Nervous System (ANS)-mediated peripheral signals and circadian features. With Edge Impulse, they plan to conduct further analyses of different activity types and leverage the platform’s scalability to continue experimenting with different data sources and subsets of extracted features.\nWhile most ML research focuses on model-dominant steps such as training and finetuning, this case study underscores the importance of a holistic approach to ML Ops, where even the initial steps of data aggregation and preprocessing fundamentally impact successful outcomes.\n\n\n13.9.2 ClinAIOps\nLet’s look at MLOps in the context of medical health monitoring to better understand how MLOps “matures” in a real-world deployment. Specifically, let’s consider continuous therapeutic monitoring (CTM) enabled by wearable devices and sensors. CTM captures detailed physiological data from patients, providing the opportunity for more frequent and personalized adjustments to treatments.\nWearable ML-enabled sensors enable continuous physiological and activity monitoring outside clinics, opening up possibilities for timely, data-driven therapy adjustments. For example, wearable insulin biosensors (Psoma and Kanthou 2023) and wrist-worn ECG sensors for glucose monitoring (Li et al. 2021) can automate insulin dosing for diabetes, wrist-worn ECG and PPG sensors can adjust blood thinners based on atrial fibrillation patterns (Attia et al. 2018; Guo et al. 2019), and accelerometers tracking gait can trigger preventative care for declining mobility in the elderly (Liu et al. 2022). The variety of signals that can now be captured passively and continuously allows therapy titration and optimization tailored to each patient’s changing needs. By closing the loop between physiological sensing and therapeutic response with TinyML and on-device learning, wearables are poised to transform many areas of personalized medicine.\n\nPsoma, Sotiria D., and Chryso Kanthou. 2023. “Wearable Insulin Biosensors for Diabetes Management: Advances and Challenges.” Biosensors 13 (7): 719. https://doi.org/10.3390/bios13070719.\n\nLi, Jingzhen, Igbe Tobore, Yuhang Liu, Abhishek Kandwal, Lei Wang, and Zedong Nie. 2021. “Non-Invasive Monitoring of Three Glucose Ranges Based on ECG by Using DBSCAN-CNN.” #IEEE_J_BHI# 25 (9): 3340–50. https://doi.org/10.1109/jbhi.2021.3072628.\n\nAttia, Zachi I., Alan Sugrue, Samuel J. Asirvatham, Michael J. Ackerman, Suraj Kapa, Paul A. Friedman, and Peter A. Noseworthy. 2018. “Noninvasive Assessment of Dofetilide Plasma Concentration Using a Deep Learning (Neural Network) Analysis of the Surface Electrocardiogram: A Proof of Concept Study.” PLoS One 13 (8): e0201059. https://doi.org/10.1371/journal.pone.0201059.\n\nGuo, Yutao, Hao Wang, Hui Zhang, Tong Liu, Zhaoguang Liang, Yunlong Xia, Li Yan, et al. 2019. “Mobile Photoplethysmographic Technology to Detect Atrial Fibrillation.” J. Am. Coll. Cardiol. 74 (19): 2365–75. https://doi.org/10.1016/j.jacc.2019.08.019.\n\nLiu, Yingcheng, Guo Zhang, Christopher G. Tarolli, Rumen Hristov, Stella Jensen-Roberts, Emma M. Waddell, Taylor L. Myers, et al. 2022. “Monitoring Gait at Home with Radio Waves in Parkinson’s Disease: A Marker of Severity, Progression, and Medication Response.” Sci. Transl. Med. 14 (663): eadc9669. https://doi.org/10.1126/scitranslmed.adc9669.\nML holds great promise in analyzing CTM data to provide data-driven recommendations for therapy adjustments. But simply deploying AI models in silos, without integrating them properly into clinical workflows and decision-making, can lead to poor adoption or suboptimal outcomes. In other words, thinking about MLOps alone is insufficient to make them useful in practice. This study shows that frameworks are needed to incorporate AI and CTM into real-world clinical practice seamlessly.\nThis case study analyzes “ClinAIOps” as a model for embedded ML operations in complex clinical environments (Chen et al. 2023). We provide an overview of the framework and why it’s needed, walk through an application example, and discuss key implementation challenges related to model monitoring, workflow integration, and stakeholder incentives. Analyzing real-world examples like ClinAIOps illuminates crucial principles and best practices for reliable and effective AI Ops across many domains.\nTraditional MLOps frameworks are insufficient for integrating continuous therapeutic monitoring (CTM) and AI in clinical settings for a few key reasons:\n\nMLOps focuses on the ML model lifecycle—training, deployment, monitoring. But healthcare involves coordinating multiple human stakeholders—patients and clinicians—not just models.\nMLOps aims to automate IT system monitoring and management. However, optimizing patient health requires personalized care and human oversight, not just automation.\nCTM and healthcare delivery are complex sociotechnical systems with many moving parts. MLOps doesn’t provide a framework for coordinating human and AI decision-making.\nEthical considerations regarding healthcare AI require human judgment, oversight, and accountability. MLOps frameworks lack processes for ethical oversight.\nPatient health data is highly sensitive and regulated. MLOps alone doesn’t ensure the handling of protected health information to privacy and regulatory standards.\nClinical validation of AI-guided treatment plans is essential for provider adoption. MLOps doesn’t incorporate domain-specific evaluation of model recommendations.\nOptimizing healthcare metrics like patient outcomes requires aligning stakeholder incentives and workflows, which pure tech-focused MLOps overlooks.\n\nThus, effectively integrating AI/ML and CTM in clinical practice requires more than just model and data pipelines; it requires coordinating complex human-AI collaborative decision-making, which ClinAIOps aims to address via its multi-stakeholder feedback loops.\n\nFeedback Loops\nThe ClinAIOps framework, shown in Figure 13.8, provides these mechanisms through three feedback loops. The loops are useful for coordinating the insights from continuous physiological monitoring, clinician expertise, and AI guidance via feedback loops, enabling data-driven precision medicine while maintaining human accountability. ClinAIOps provides a model for effective human-AI symbiosis in healthcare: the patient is at the center, providing health challenges and goals that inform the therapy regimen; the clinician oversees this regimen, giving inputs for adjustments based on continuous monitoring data and health reports from the patient; whereas AI developers play a crucial role by creating systems that generate alerts for therapy updates, which the clinician then vets.\nThese feedback loops, which we will discuss below, help maintain clinician responsibility and control over treatment plans by reviewing AI suggestions before they impact patients. They help dynamically customize AI model behavior and outputs to each patient’s changing health status. They help improve model accuracy and clinical utility over time by learning from clinician and patient responses. They facilitate shared decision-making and personalized care during patient-clinician interactions. They enable rapid optimization of therapies based on frequent patient data that clinicians cannot manually analyze.\n\n\n\n\n\n\nFigure 13.8: ClinAIOps cycle. Credit: Chen et al. (2023).\n\n\n\n\nPatient-AI Loop\nThe patient-AI loop enables frequent therapy optimization driven by continuous physiological monitoring. Patients are prescribed wearables like smartwatches or skin patches to collect relevant health signals passively. For example, a diabetic patient could have a continuous glucose monitor, or a heart disease patient may wear an ECG patch. An AI model analyzes the patient’s longitudinal health data streams in the context of their electronic medical records - their diagnoses, lab tests, medications, and demographics. The AI model suggests adjustments to the treatment regimen tailored to that individual, like changing a medication dose or administration schedule. Minor adjustments within a pre-approved safe range can be made by the patient independently, while major changes are reviewed by the clinician first. This tight feedback between the patient’s physiology and AI-guided therapy allows data-driven, timely optimizations like automated insulin dosing recommendations based on real-time glucose levels for diabetes patients.\n\n\nClinician-AI Loop\nThe clinician-AI loop allows clinical oversight over AI-generated recommendations to ensure safety and accountability. The AI model provides the clinician with treatment recommendations and easily reviewed summaries of the relevant patient data on which the suggestions are based. For instance, an AI may suggest lowering a hypertension patient’s blood pressure medication dose based on continuously low readings. The clinician can accept, reject, or modify the AI’s proposed prescription changes. This clinician feedback further trains and improves the model. Additionally, the clinician sets the bounds for the types and extent of treatment changes the AI can autonomously recommend to patients. By reviewing AI suggestions, the clinician maintains ultimate treatment authority based on their clinical judgment and accountability. This loop allows them to oversee patient cases with AI assistance efficiently.\n\n\nPatient-Clinician Loop\nInstead of routine data collection, the clinician can focus on interpreting high-level data patterns and collaborating with the patient to set health goals and priorities. The AI assistance will also free up clinicians’ time, allowing them to focus more deeply on listening to patients’ stories and concerns. For instance, the clinician may discuss diet and exercise changes with a diabetes patient to improve their glucose control based on their continuous monitoring data. Appointment frequency can also be dynamically adjusted based on patient progress rather than following a fixed calendar. Freed from basic data gathering, the clinician can provide coaching and care customized to each patient informed by their continuous health data. The patient-clinician relationship is made more productive and personalized.\n\n\n\nHypertension Example\nLet’s consider an example. According to the Centers for Disease Control and Prevention, nearly half of adults have hypertension (48.1%, 119.9 million). Hypertension can be managed through ClinAIOps with the help of wearable sensors using the following approach:\n\nData Collection\nThe data collected would include continuous blood pressure monitoring using a wrist-worn device equipped with photoplethysmography (PPG) and electrocardiography (ECG) sensors to estimate blood pressure (Zhang, Zhou, and Zeng 2017). The wearable would also track the patient’s physical activity via embedded accelerometers. The patient would log any antihypertensive medications they take, along with the time and dose. The patient’s demographic details and medical history from their electronic health record (EHR) would also be incorporated. This multimodal real-world data provides valuable context for the AI model to analyze the patient’s blood pressure patterns, activity levels, medication adherence, and responses to therapy.\n\nZhang, Qingxue, Dian Zhou, and Xuan Zeng. 2017. “Highly Wearable Cuff-Less Blood Pressure and Heart Rate Monitoring with Single-Arm Electrocardiogram and Photoplethysmogram Signals.” BioMedical Engineering OnLine 16 (1): 23. https://doi.org/10.1186/s12938-017-0317-z.\n\n\nAI Model\nThe on-device AI model would analyze the patient’s continuous blood pressure trends, circadian patterns, physical activity levels, medication adherence behaviors, and other contexts. It would use ML to predict optimal antihypertensive medication doses and timing to control the individual’s blood pressure. The model would send dosage change recommendations directly to the patient for minor adjustments or to the reviewing clinician for approval for more significant modifications. By observing clinician feedback on its recommendations and evaluating the resulting blood pressure outcomes in patients, the AI model could be continually retrained and improved to enhance performance. The goal is fully personalized blood pressure management optimized for each patient’s needs and responses.\n\n\nPatient-AI Loop\nIn the Patient-AI loop, the hypertensive patient would receive notifications on their wearable device or tethered smartphone app recommending adjustments to their antihypertensive medications. For minor dose changes within a pre-defined safe range, the patient could independently implement the AI model’s suggested adjustment to their regimen. However, the patient must obtain clinician approval before changing their dosage for more significant modifications. Providing personalized and timely medication recommendations automates an element of hypertension self-management for the patient. It can improve their adherence to the regimen as well as treatment outcomes. The patient is empowered to leverage AI insights to control their blood pressure better.\n\n\nClinician-AI Loop\nIn the Clinician-AI loop, the provider would receive summaries of the patient’s continuous blood pressure trends and visualizations of their medication-taking patterns and adherence. They review the AI model’s suggested antihypertensive dosage changes and decide whether to approve, reject, or modify the recommendations before they reach the patient. The clinician also specifies the boundaries for how much the AI can independently recommend changing dosages without clinician oversight. If the patient’s blood pressure is trending at dangerous levels, the system alerts the clinician so they can promptly intervene and adjust medications or request an emergency room visit. This loop maintains accountability and safety while allowing the clinician to harness AI insights by keeping the clinician in charge of approving major treatment changes.\n\n\nPatient-Clinician Loop\nIn the Patient-Clinician loop, shown in Figure 13.9, the in-person visits would focus less on collecting data or basic medication adjustments. Instead, the clinician could interpret high-level trends and patterns in the patient’s continuous monitoring data and have focused discussions about diet, exercise, stress management, and other lifestyle changes to improve their blood pressure control holistically. The frequency of appointments could be dynamically optimized based on the patient’s stability rather than following a fixed calendar. Since the clinician would not need to review all the granular data, they could concentrate on delivering personalized care and recommendations during visits. With continuous monitoring and AI-assisted optimization of medications between visits, the clinician-patient relationship focuses on overall wellness goals and becomes more impactful. This proactive and tailored data-driven approach can help avoid hypertension complications like stroke, heart failure, and other threats to patient health and well-being.\n\n\n\n\n\n\nFigure 13.9: ClinAIOps interactive loop. Credit: Chen et al. (2023).\n\n\nChen, Emma, Shvetank Prakash, Vijay Janapa Reddi, David Kim, and Pranav Rajpurkar. 2023. “A Framework for Integrating Artificial Intelligence for Clinical Care with Continuous Therapeutic Monitoring.” Nat. Biomed. Eng., November. https://doi.org/10.1038/s41551-023-01115-0.\n\n\n\n\n\nMLOps vs. ClinAIOps\nThe hypertension example illustrates well why traditional MLOps are insufficient for many real-world AI applications and why frameworks like ClinAIOps are needed instead.\nWith hypertension, simply developing and deploying an ML model for adjusting medications would only succeed if it considered the broader clinical context. The patient, clinician, and health system have concerns about shaping adoption. The AI model cannot optimize blood pressure outcomes alone—it requires integrating with workflows, behaviors, and incentives.\n\nSome key gaps the example highlights in a pure MLOps approach:\nThe model itself would lack the real-world patient data at scale to recommend treatments reliably. ClinAIOps enables this by collecting feedback from clinicians and patients via continuous monitoring.\nClinicians would only trust model recommendations with transparency, explainability, and accountability. ClinAIOps keeps the clinician in the loop to build confidence.\nPatients need personalized coaching and motivation - not just AI notifications. The ClinAIOps patient-clinician loop facilitates this.\nSensor reliability and data accuracy would only be sufficient with clinical oversight. ClinAIOps validates recommendations.\nLiability for treatment outcomes must be clarified with just an ML model. ClinAIOps maintains human accountability.\nHealth systems would need to demonstrate value to change workflows. ClinAIOps aligns stakeholders.\n\nThe hypertension case clearly shows the need to look beyond training and deploying a performant ML model to consider the entire human-AI sociotechnical system. This is the key gap ClinAIOps aims to address over traditional MLOps. Traditional MLOps is overly tech-focused on automating ML model development and deployment, while ClinAIOps incorporates clinical context and human-AI coordination through multi-stakeholder feedback loops.\nTable 13.2 compares them. This table highlights how, when MLOps is implemented, we need to consider more than just ML models.\n\n\n\nTable 13.2: Comparison of MLOps versus AI operations for clinical use.\n\n\n\n\n\n\n\n\n\n\n\nTraditional MLOps\nClinAIOps\n\n\n\n\nFocus\nML model development and deployment\nCoordinating human and AI decision-making\n\n\nStakeholders\nData scientists, IT engineers\nPatients, clinicians, AI developers\n\n\nFeedback loops\nModel retraining, monitoring\nPatient-AI, clinician-AI, patient-clinician\n\n\nObjective\nOperationalize ML deployments\nOptimize patient health outcomes\n\n\nProcesses\nAutomated pipelines and infrastructure\nIntegrates clinical workflows and oversight\n\n\nData considerations\nBuilding training datasets\nPrivacy, ethics, protected health information\n\n\nModel validation\nTesting model performance metrics\nClinical evaluation of recommendations\n\n\nImplementation\nFocuses on technical integration\nAligns incentives of human stakeholders\n\n\n\n\n\n\n\n\nSummary\nIn complex domains like healthcare, successfully deploying AI requires moving beyond a narrow focus on training and deploying performant ML models. As illustrated through the hypertension example, real-world integration of AI necessitates coordinating diverse stakeholders, aligning incentives, validating recommendations, and maintaining accountability. Frameworks like ClinAIOps, which facilitate collaborative human-AI decision-making through integrated feedback loops, are needed to address these multifaceted challenges. Rather than just automating tasks, AI must augment human capabilities and clinical workflows. This allows AI to positively impact patient outcomes, population health, and healthcare efficiency.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1463,7 +1463,7 @@ "section": "13.10 Conclusion", "text": "13.10 Conclusion\nEmbedded ML is poised to transform many industries by enabling AI capabilities directly on edge devices like smartphones, sensors, and IoT hardware. However, developing and deploying TinyML models on resource-constrained embedded systems poses unique challenges compared to traditional cloud-based MLOps.\nThis chapter provided an in-depth analysis of key differences between traditional and embedded MLOps across the model lifecycle, development workflows, infrastructure management, and operational practices. We discussed how factors like intermittent connectivity, decentralized data, and limited on-device computing necessitate innovative techniques like federated learning, on-device inference, and model optimization. Architectural patterns like cross-device learning and hierarchical edge-cloud infrastructure help mitigate constraints.\nThrough concrete examples like Oura Ring and ClinAIOps, we demonstrated applied principles for embedded MLOps. The case studies highlighted critical considerations beyond core ML engineering, like aligning stakeholder incentives, maintaining accountability, and coordinating human-AI decision-making. This underscores the need for a holistic approach spanning both technical and human elements.\nWhile embedded MLOps face impediments, emerging tools like Edge Impulse and lessons from pioneers help accelerate TinyML innovation. A solid understanding of foundational MLOps principles tailored to embedded environments will empower more organizations to overcome constraints and deliver distributed AI capabilities. As frameworks and best practices mature, seamlessly integrating ML into edge devices and processes will transform industries through localized intelligence.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1472,9 +1472,9 @@ "href": "contents/ops/ops.html#sec-embedded-aiops-resource", "title": "13  ML Operations", "section": "13.11 Resources", - "text": "13.11 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nMLOps, DevOps, and AIOps.\nMLOps overview.\nTiny MLOps.\nMLOps: a use case.\nMLOps: Key Activities and Lifecycle.\nML Lifecycle.\nScaling TinyML: Challenges and Opportunities.\nTraining Operationalization:\n\nTraining Ops: CI/CD trigger.\nContinuous Integration.\nContinuous Deployment.\nProduction Deployment.\nProduction Deployment: Online Experimentation.\nTraining Ops Impact on MLOps.\n\nModel Deployment:\n\nScaling ML Into Production Deployment.\nContainers for Scaling ML Deployment.\nChallenges for Scaling TinyML Deployment: Part 1.\nChallenges for Scaling TinyML Deployment: Part 2.\nModel Deployment Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 13.1\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "13.11 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides serve as a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nMLOps, DevOps, and AIOps.\nMLOps overview.\nTiny MLOps.\nMLOps: a use case.\nMLOps: Key Activities and Lifecycle.\nML Lifecycle.\nScaling TinyML: Challenges and Opportunities.\nTraining Operationalization:\n\nTraining Ops: CI/CD trigger.\nContinuous Integration.\nContinuous Deployment.\nProduction Deployment.\nProduction Deployment: Online Experimentation.\nTraining Ops Impact on MLOps.\n\nModel Deployment:\n\nScaling ML Into Production Deployment.\nContainers for Scaling ML Deployment.\nChallenges for Scaling TinyML Deployment: Part 1.\nChallenges for Scaling TinyML Deployment: Part 2.\nModel Deployment Impact on MLOps.\n\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 13.1\nVideo 13.2\nVideo 13.3\nVideo 13.4\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 13.1\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we also offer a series of hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Deployment", "13  ML Operations" ] }, @@ -1485,7 +1485,7 @@ "section": "", "text": "14.1 Introduction\nMachine learning has evolved substantially from its academic origins, where privacy was not a primary concern. As ML migrated into commercial and consumer applications, the data became more sensitive - encompassing personal information like communications, purchases, and health data. This explosion of data availability fueled rapid advancements in ML capabilities. However, it also exposed new privacy risks, as demonstrated by incidents like the AOL data leak in 2006 and the Cambridge Analytica scandal.\nThese events highlighted the growing need to address privacy in ML systems. In this chapter, we explore privacy and security considerations together, as they are inherently linked in ML:\nFor example, an ML-powered home security camera must secure video feeds against unauthorized access and provide privacy protections to ensure only intended users can view the footage. A breach of either security or privacy could expose private user moments.\nEmbedded ML systems like smart assistants and wearables are ubiquitous and process intimate user data. However, their computational constraints often prevent heavy security protocols. Designers must balance performance needs with rigorous security and privacy standards tailored to embedded hardware limitations.\nThis chapter provides essential knowledge for addressing the complex privacy and security landscape of embedded ML. We will explore vulnerabilities and cover various techniques that enhance privacy and security within embedded systems’ resource constraints.\nWe hope that by building a holistic understanding of risks and safeguards, you will gain the principles to develop secure, ethical, embedded ML applications.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1496,7 +1496,7 @@ "section": "", "text": "Privacy refers to controlling access to sensitive user data, such as financial information or biometric data collected by an ML application.\nSecurity protects ML systems and data from hacking, theft, and misuse.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1507,7 +1507,7 @@ "section": "14.2 Terminology", "text": "14.2 Terminology\nIn this chapter, we will discuss security and privacy together, so there are key terms that we need to be clear about.\n\nPrivacy: Consider an ML-powered home security camera that identifies and records potential threats. This camera records identifiable information, including faces, of individuals approaching and potentially entering this home. Privacy concerns may surround who can access this data.\nSecurity: Consider an ML-powered home security camera that identifies and records potential threats. The security aspect would ensure that hackers cannot access these video feeds and recognition models.\nThreat: Using our home security camera example, a threat could be a hacker trying to access live feeds or stored videos or using false inputs to trick the system.\nVulnerability: A common vulnerability might be a poorly secured network through which the camera connects to the internet, which could be exploited to access the data.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1516,9 +1516,9 @@ "href": "contents/privacy_security/privacy_security.html#historical-precedents", "title": "14  Security & Privacy", "section": "14.3 Historical Precedents", - "text": "14.3 Historical Precedents\nWhile the specifics of machine learning hardware security can be distinct, the embedded systems field has a history of security incidents that provide critical lessons for all connected systems, including those using ML. Here are detailed explorations of past breaches:\n\n14.3.1 Stuxnet\nIn 2010, something unexpected was found on a computer in Iran - a very complicated computer virus that experts had never seen before. Stuxnet was a malicious computer worm that targeted supervisory control and data acquisition (SCADA) systems and was designed to damage Iran’s nuclear program (Farwell and Rohozinski 2011). Stuxnet was using four “zero-day exploits” - attacks that take advantage of secret weaknesses in software that no one knows about yet. This made Stuxnet very sneaky and hard to detect.\n\nFarwell, James P., and Rafal Rohozinski. 2011. “Stuxnet and the Future of Cyber War.” Survival 53 (1): 23–40. https://doi.org/10.1080/00396338.2011.555586.\nBut Stuxnet wasn’t designed to steal information or spy on people. Its goal was physical destruction - to sabotage centrifuges at Iran’s Natanz nuclear plant! So how did the virus get onto computers at the Natanz plant, which was supposed to be disconnected from the outside world for security? Experts think someone inserted a USB stick containing Stuxnet into the internal Natanz network. This allowed the virus to “jump” from an outside system onto the isolated nuclear control systems and wreak havoc.\nStuxnet was incredibly advanced malware built by national governments to cross from the digital realm into real-world infrastructure. It specifically targeted important industrial machines, where embedded machine learning is highly applicable in a way never done before. The virus provided a wake-up call about how sophisticated cyberattacks could now physically destroy equipment and facilities.\nThis breach was significant due to its sophistication; Stuxnet specifically targeted programmable logic controllers (PLCs) used to automate electromechanical processes such as the speed of centrifuges for uranium enrichment. The worm exploited vulnerabilities in the Windows operating system to gain access to the Siemens Step7 software controlling the PLCs. Despite not being a direct attack on ML systems, Stuxnet is relevant for all embedded systems as it showcases the potential for state-level actors to design attacks that bridge the cyber and physical worlds with devastating effects.\n\n\n14.3.2 Jeep Cherokee Hack\nThe Jeep Cherokee hack was a groundbreaking event demonstrating the risks inherent in increasingly connected automobiles (Miller 2019). In a controlled demonstration, security researchers remotely exploited a vulnerability in the Uconnect entertainment system, which had a cellular connection to the internet. They were able to control the vehicle’s engine, transmission, and brakes, alarming the automotive industry into recognizing the severe safety implications of cyber vulnerabilities in vehicles.\n\nMiller, Charlie. 2019. “Lessons Learned from Hacking a Car.” IEEE Design &Amp; Test 36 (6): 7–9. https://doi.org/10.1109/mdat.2018.2863106.\nThe video below is a short documentary of the attack.\n\nWhile this wasn’t an attack on an ML system per se, the reliance of modern vehicles on embedded systems for safety-critical functions has significant parallels to the deployment of ML in embedded systems, underscoring the need for robust security at the hardware level.\n\n\n14.3.3 Mirai Botnet\nThe Mirai botnet involved the infection of networked devices such as digital cameras and DVR players (Antonakakis et al. 2017). In October 2016, the botnet was used to conduct one of the largest DDoS attacks, disrupting internet access across the United States. The attack was possible because many devices used default usernames and passwords, which were easily exploited by the Mirai malware to control the devices.\n\nAntonakakis, Manos, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, et al. 2017. “Understanding the Mirai Botnet.” In 26th USENIX Security Symposium (USENIX Security 17), 1093–1110.\nThe following video presentation explains how the Mirai Botnet works.\n\nAlthough the devices were not ML-based, the incident is a stark reminder of what can happen when numerous embedded devices with poor security controls are networked, which is becoming more common with the growth of ML-based IoT devices.\n\n\n14.3.4 Implications\nThese historical breaches demonstrate the cascading effects of hardware vulnerabilities in embedded systems. Each incident offers a precedent for understanding the risks and designing better security protocols. For instance, the Mirai botnet highlights the immense destructive potential when threat actors can gain control over networked devices with weak security, a situation becoming increasingly common with ML systems. Many current ML devices function as “edge” devices meant to collect and process data locally before sending it to the cloud. Much like the cameras and DVRs compromised by Mirai, edge ML devices often rely on embedded hardware like ARM processors and run lightweight O.S. like Linux. Securing the device credentials is critical.\nSimilarly, the Jeep Cherokee hack was a watershed moment for the automotive industry. It exposed serious vulnerabilities in the growing network-connected vehicle systems and their lack of isolation from core drive systems like brakes and steering. In response, auto manufacturers invested heavily in new cybersecurity measures, though gaps likely remain.\nChrysler did a recall to patch the vulnerable Uconnect software, allowing the remote exploit. This included adding network-level protections to prevent unauthorized external access and compartmentalizing in-vehicle systems to limit lateral movement. Additional layers of encryption were added for commands sent over the CAN bus within vehicles.\nThe incident also spurred the creation of new cybersecurity standards and best practices. The Auto-ISAC was established for automakers to share intelligence, and the NHTSA guided management risks. New testing and audit procedures were developed to assess vulnerabilities proactively. The aftereffects continue to drive change in the automotive industry as cars become increasingly software-defined.\nUnfortunately, manufacturers often overlook security in the rush to develop new ML edge devices - using default passwords, unencrypted communications, unsecured firmware updates, etc. Any such vulnerabilities could allow attackers to gain access and control devices at scale by infecting them with malware. With a botnet of compromised ML devices, attackers could leverage their aggregated computational power for DDoS attacks on critical infrastructure.\nWhile these events didn’t directly involve machine learning hardware, the principles of the attacks carry over to ML systems, which often involve similar embedded devices and network architectures. As ML hardware often operates in continuous interaction with the physical world, securing it against such breaches is paramount. The evolution of security measures in response to these incidents provides valuable insights into protecting current and future ML systems from analogous vulnerabilities.\nThe distributed nature of ML edge devices means threats can propagate quickly across networks. And if devices are being used for mission-critical purposes like medical devices, industrial controls, or self-driving vehicles, the potential physical damage from weaponized ML bots could be severe. Just like Mirai demonstrated the dangerous potential of poorly secured IoT devices, the litmus test for ML hardware security will be how vulnerable or resilient these devices are to worm-like attacks. The stakes are raised as ML spreads to safety-critical domains, putting the onus on manufacturers and system operators to incorporate the lessons from Mirai.\nThe lesson is the importance of designing for security from the outset and having layered defenses. For ML systems, the Jeep case highlights potential blindspots around externally facing software interfaces and isolation between subsystems. Manufacturers of ML devices and platforms should assume a similar proactive and comprehensive approach to security rather than leaving it as an afterthought. Rapid response and dissemination of best practices will be key as threats evolve.", + "text": "14.3 Historical Precedents\nWhile the specifics of machine learning hardware security can be distinct, the embedded systems field has a history of security incidents that provide critical lessons for all connected systems, including those using ML. Here are detailed explorations of past breaches:\n\n14.3.1 Stuxnet\nIn 2010, something unexpected was found on a computer in Iran - a very complicated computer virus that experts had never seen before. Stuxnet was a malicious computer worm that targeted supervisory control and data acquisition (SCADA) systems and was designed to damage Iran’s nuclear program (Farwell and Rohozinski 2011). Stuxnet was using four “zero-day exploits” - attacks that take advantage of secret weaknesses in software that no one knows about yet. This made Stuxnet very sneaky and hard to detect.\n\nFarwell, James P., and Rafal Rohozinski. 2011. “Stuxnet and the Future of Cyber War.” Survival 53 (1): 23–40. https://doi.org/10.1080/00396338.2011.555586.\nBut Stuxnet wasn’t designed to steal information or spy on people. Its goal was physical destruction - to sabotage centrifuges at Iran’s Natanz nuclear plant! So how did the virus get onto computers at the Natanz plant, which was supposed to be disconnected from the outside world for security? Experts think someone inserted a USB stick containing Stuxnet into the internal Natanz network. This allowed the virus to “jump” from an outside system onto the isolated nuclear control systems and wreak havoc.\nStuxnet was incredibly advanced malware built by national governments to cross from the digital realm into real-world infrastructure. It specifically targeted important industrial machines, where embedded machine learning is highly applicable in a way never done before. The virus provided a wake-up call about how sophisticated cyberattacks could now physically destroy equipment and facilities.\nThis breach was significant due to its sophistication; Stuxnet specifically targeted programmable logic controllers (PLCs) used to automate electromechanical processes such as the speed of centrifuges for uranium enrichment. The worm exploited vulnerabilities in the Windows operating system to gain access to the Siemens Step7 software controlling the PLCs. Despite not being a direct attack on ML systems, Stuxnet is relevant for all embedded systems as it showcases the potential for state-level actors to design attacks that bridge the cyber and physical worlds with devastating effects.\n\n\n14.3.2 Jeep Cherokee Hack\nThe Jeep Cherokee hack was a groundbreaking event demonstrating the risks inherent in increasingly connected automobiles (Miller 2019). In a controlled demonstration, security researchers remotely exploited a vulnerability in the Uconnect entertainment system, which had a cellular connection to the internet. They were able to control the vehicle’s engine, transmission, and brakes, alarming the automotive industry into recognizing the severe safety implications of cyber vulnerabilities in vehicles. Video 14.1 below is a short documentary of the attack.\n\nMiller, Charlie. 2019. “Lessons Learned from Hacking a Car.” IEEE Design &Amp; Test 36 (6): 7–9. https://doi.org/10.1109/mdat.2018.2863106.\n\n\n\n\n\n\nVideo 14.1: Jeep Cherokee Hack\n\n\n\n\n\n\nWhile this wasn’t an attack on an ML system per se, the reliance of modern vehicles on embedded systems for safety-critical functions has significant parallels to the deployment of ML in embedded systems, underscoring the need for robust security at the hardware level.\n\n\n14.3.3 Mirai Botnet\nThe Mirai botnet involved the infection of networked devices such as digital cameras and DVR players (Antonakakis et al. 2017). In October 2016, the botnet was used to conduct one of the largest DDoS attacks, disrupting internet access across the United States. The attack was possible because many devices used default usernames and passwords, which were easily exploited by the Mirai malware to control the devices. Video 14.2 explains how the Mirai Botnet works.\n\nAntonakakis, Manos, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, et al. 2017. “Understanding the Mirai Botnet.” In 26th USENIX Security Symposium (USENIX Security 17), 1093–1110.\n\n\n\n\n\n\nVideo 14.2: Jeep Cherokee Hack\n\n\n\n\n\n\nAlthough the devices were not ML-based, the incident is a stark reminder of what can happen when numerous embedded devices with poor security controls are networked, which is becoming more common with the growth of ML-based IoT devices.\n\n\n14.3.4 Implications\nThese historical breaches demonstrate the cascading effects of hardware vulnerabilities in embedded systems. Each incident offers a precedent for understanding the risks and designing better security protocols. For instance, the Mirai botnet highlights the immense destructive potential when threat actors can gain control over networked devices with weak security, a situation becoming increasingly common with ML systems. Many current ML devices function as “edge” devices meant to collect and process data locally before sending it to the cloud. Much like the cameras and DVRs compromised by Mirai, edge ML devices often rely on embedded hardware like ARM processors and run lightweight O.S. like Linux. Securing the device credentials is critical.\nSimilarly, the Jeep Cherokee hack was a watershed moment for the automotive industry. It exposed serious vulnerabilities in the growing network-connected vehicle systems and their lack of isolation from core drive systems like brakes and steering. In response, auto manufacturers invested heavily in new cybersecurity measures, though gaps likely remain.\nChrysler did a recall to patch the vulnerable Uconnect software, allowing the remote exploit. This included adding network-level protections to prevent unauthorized external access and compartmentalizing in-vehicle systems to limit lateral movement. Additional layers of encryption were added for commands sent over the CAN bus within vehicles.\nThe incident also spurred the creation of new cybersecurity standards and best practices. The Auto-ISAC was established for automakers to share intelligence, and the NHTSA guided management risks. New testing and audit procedures were developed to assess vulnerabilities proactively. The aftereffects continue to drive change in the automotive industry as cars become increasingly software-defined.\nUnfortunately, manufacturers often overlook security in the rush to develop new ML edge devices - using default passwords, unencrypted communications, unsecured firmware updates, etc. Any such vulnerabilities could allow attackers to gain access and control devices at scale by infecting them with malware. With a botnet of compromised ML devices, attackers could leverage their aggregated computational power for DDoS attacks on critical infrastructure.\nWhile these events didn’t directly involve machine learning hardware, the principles of the attacks carry over to ML systems, which often involve similar embedded devices and network architectures. As ML hardware often operates in continuous interaction with the physical world, securing it against such breaches is paramount. The evolution of security measures in response to these incidents provides valuable insights into protecting current and future ML systems from analogous vulnerabilities.\nThe distributed nature of ML edge devices means threats can propagate quickly across networks. And if devices are being used for mission-critical purposes like medical devices, industrial controls, or self-driving vehicles, the potential physical damage from weaponized ML bots could be severe. Just like Mirai demonstrated the dangerous potential of poorly secured IoT devices, the litmus test for ML hardware security will be how vulnerable or resilient these devices are to worm-like attacks. The stakes are raised as ML spreads to safety-critical domains, putting the onus on manufacturers and system operators to incorporate the lessons from Mirai.\nThe lesson is the importance of designing for security from the outset and having layered defenses. For ML systems, the Jeep case highlights potential blindspots around externally facing software interfaces and isolation between subsystems. Manufacturers of ML devices and platforms should assume a similar proactive and comprehensive approach to security rather than leaving it as an afterthought. Rapid response and dissemination of best practices will be key as threats evolve.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1529,7 +1529,7 @@ "section": "14.4 Security Threats to ML Models", "text": "14.4 Security Threats to ML Models\nML models face security risks that can undermine their integrity, performance, and trustworthiness if not properly addressed. While there are several different threats, the key threats include: Model theft, where adversaries steal the proprietary model parameters and the sensitive data they contain. Data poisoning, which compromises models through data tampering. Adversarial attacks deceive the model to make incorrect or unwanted predictions.\n\n14.4.1 Model Theft\nModel theft occurs when an attacker gains unauthorized access to a deployed ML model. The concern here is the theft of the model’s structure and trained parameters and the proprietary data it contains (Ateniese et al. 2015). Model theft is a real and growing threat, as demonstrated by cases like ex-Google engineer Anthony Levandowski, who allegedly stole Waymo’s self-driving car designs and started a competing company. Beyond economic impacts, model theft can seriously undermine privacy and enable further attacks.\n\nAteniese, Giuseppe, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali, and Giovanni Felici. 2015. “Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers.” Int. J. Secur. Netw. 10 (3): 137. https://doi.org/10.1504/ijsn.2015.071829.\nFor instance, consider an ML model developed for personalized recommendations in an e-commerce application. If a competitor steals this model, they gain insights into business analytics, customer preferences, and even trade secrets embedded within the model’s data. Attackers could leverage stolen models to craft more effective inputs for model inversion attacks, deducing private details about the model’s training data. A cloned e-commerce recommendation model could reveal customer purchase behaviors and demographics.\nTo understand model inversion attacks, consider a facial recognition system used to grant access to secured facilities. The system is trained on a dataset of employee photos. An attacker could infer features of the original dataset by observing the model’s output to various inputs. For example, suppose the model’s confidence level for a particular face is significantly higher for a given set of features. In that case, an attacker might deduce that someone with those features is likely in the training dataset.\nThe methodology of model inversion typically involves the following steps:\n\nAccessing Model Outputs: The attacker queries the ML model with input data and observes the outputs. This is often done through a legitimate interface, like a public API.\nAnalyzing Confidence Scores: For each input, the model provides a confidence score that reflects how similar the input is to the training data.\nReverse-Engineering: By analyzing the confidence scores or output probabilities, attackers can use optimization techniques to reconstruct what they believe is close to the original input data.\n\nOne historical example of such a vulnerability being explored was the research on inversion attacks against the U.S. Netflix Prize dataset, where researchers demonstrated that it was possible to learn about an individual’s movie preferences, which could lead to privacy breaches (Narayanan and Shmatikov 2006).\n\nNarayanan, Arvind, and Vitaly Shmatikov. 2006. “How to Break Anonymity of the Netflix Prize Dataset.” arXiv Preprint Cs/0610105.\nModel theft implies that it could lead to economic losses, undermine competitive advantage, and violate user privacy. There’s also the risk of model inversion attacks, where an adversary could input various data into the stolen model to infer sensitive information about the training data.\nBased on the desired asset, model theft attacks can be divided into two categories: exact model properties and approximate model behavior.\n\nStealing Exact Model Properties\nIn these attacks, the objective is to extract information about concrete metrics, such as a network’s learned parameters, fine-tuned hyperparameters, and the model’s internal layer architecture (Oliynyk, Mayer, and Rauber 2023).\n\nLearned Parameters: Adversaries aim to steal a model’s learned knowledge (weights and biases) in order to replicate it. Parameter theft is generally used in conjunction with other attacks, such as architecture theft, which lacks parameter knowledge.\nFine-Tuned Hyperparameters: Training is costly, and finding the right configuration of hyperparameters (such as the learning rate and regularization) can be a very long and expensive process. Thus, stealing an optimized model’s hyperparameters can allow an adversary to replicate the model without the high training costs.\nModel Architecture: This attack concerns the specific design and structure of the model, such as layers, neurons, and connectivity patterns. Aside from reducing associated training costs, it can provide an attacker; this type of theft is especially dangerous because it concerns core I.P. theft, which can affect a company’s competitive edge. Architecture theft can be achieved by exploiting side-channel attacks (discussed later).\n\n\n\nStealing Approximate Model Behavior\nInstead of focusing on extracting exact numerical values of the model’s parameters, these attacks aim to reproduce the model’s behavior (predictions and effectiveness), decision-making, and high-level characteristics (Oliynyk, Mayer, and Rauber 2023). These techniques aim to achieve similar outcomes while allowing for internal deviations in parameters and architecture. Types of approximate behavior theft include achieving the same level of effectiveness and obtaining prediction consistency.\n\nOliynyk, Daryna, Rudolf Mayer, and Andreas Rauber. 2023. “I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences.” ACM Comput. Surv. 55 (14s): 1–41. https://doi.org/10.1145/3595292.\n\nLevel of Effectiveness: Attackers aim to replicate the model’s decision-making capabilities rather than focus on the precise parameter values. This is done through understanding the overall behavior of the model. Consider a scenario where an attacker wants to copy the behavior of an image classification model. By analyzing the model’s decision boundaries, the attack tunes its model to reach an effectiveness comparable to the original model. This could entail analyzing 1) the confusion matrix to understand the balance of prediction metrics (true positive, true negative, false positive, false negative) and 2)other performance metrics, such as F1 score and precision, to ensure that the two models are comparable.\nPrediction Consistency: The attacker tries to align their model’s prediction patterns with the target model’s. This involves matching prediction outputs (both positive and negative) on the same set of inputs and ensuring distributional consistency across different classes. For instance, consider a natural language processing (NLP) model that generates sentiment analysis for move reviews (labels reviews as positive, neutral, or negative). The attacker will try to fine-tune their model to match the prediction of the original models on the same set of movie reviews. This includes ensuring that the model makes the same mistakes (mispredictions) that the targeted model makes.\n\n\n\nCase Study\nIn 2018, Tesla filed a lawsuit against self-driving car startup Zoox, alleging former employees stole confidential data and trade secrets related to Tesla’s autonomous driving assistance system.\nTesla claimed that several of its former employees took over 10 G.B. of proprietary data, including ML models and source code, before joining Zoox. This allegedly included one of Tesla’s crucial image recognition models for identifying objects.\nThe theft of this sensitive proprietary model could help Zoox shortcut years of ML development and duplicate Tesla’s capabilities. Tesla argued this theft of I.P. caused major financial and competitive harm. There were also concerns it could allow model inversion attacks to infer private details about Tesla’s testing data.\nThe Zoox employees denied stealing any proprietary information. However, the case highlights the significant risks of model theft—enabling the cloning of commercial models, causing economic impacts, and opening the door for further data privacy violations.\n\n\n\n14.4.2 Data Poisoning\nData poisoning is an attack where the training data is tampered with, leading to a compromised model (Biggio, Nelson, and Laskov 2012). Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model’s learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012. “Poisoning Attacks Against Support Vector Machines.” In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\nThe process usually involves the following steps:\n\nInjection: The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.\nTraining: The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.\nDeployment: Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.\n\nThe impacts of data poisoning extend beyond just classification errors or accuracy drops. For instance, if incorrect or malicious data is introduced into a traffic sign recognition system’s training set, the model may learn to misclassify stop signs as yield signs, which can have dangerous real-world consequences, especially in embedded autonomous systems like autonomous vehicles.\nData poisoning can degrade a model’s accuracy, force it to make incorrect predictions or cause it to behave unpredictably. In critical applications like healthcare, such alterations can lead to significant trust and safety issues.\nThere are six main categories of data poisoning (Oprea, Singhal, and Vassilev 2022):\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022. “Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?” Computer 55 (11): 94–99. https://doi.org/10.1109/mc.2022.3190787.\n\nAvailability Attacks: These attacks aim to compromise a model’s overall functionality. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.\nTargeted Attacks: In contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy on the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model’s classes, making detecting these attacks more challenging.\nBackdoor Attacks: In these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor(a malicious, hidden trigger or pattern) into the training data, such as manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.\nSubpopulation Attacks: Attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:\nScope: While targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a ‘speed bump’ warning sign(with carefully crafted perturbation or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person’s speech is an example of a subpopulation attack.\nKnowledge: While targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.\n\n\nCase Study 1\nIn 2017, researchers demonstrated a data poisoning attack against a popular toxicity classification model called Perspective (Hosseini et al. 2017). This ML model detects toxic comments online.\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. “Deceiving Google’s Perspective Api Built for Detecting Toxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\nThe researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model’s training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time.\nAfter retraining on the poisoned data, the model’s false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems.\nThis case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.\n\n\nCase Study 2\nInterestingly enough, data poisoning attacks are not always malicious (Shan et al. 2023). Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative A.I. models. Artists can use the tool to modify their images subtly before uploading them online.\nWhile these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative A.I. models when incorporated into the training data. Generative models can be manipulated to generate hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers could trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars.\nAs the number of poisoned images on the internet increases, the performance of the models that use scraped data will deteriorate exponentially. First, the poisoned data is hard to detect and requires manual elimination. Second, the “poison” spreads quickly to other labels because generative models rely on connections between words and concepts as they generate images. So a poisoned image of a “car” could spread into generated images associated with words like “truck,” “train,” “bus,” etc.\nOn the other hand, this tool can be used maliciously and can affect legitimate applications of the generative models. This shows the very challenging and novel nature of machine learning attacks.\nFigure 17.26 demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.\n\n\n\n\n\n\nFigure 14.1: Data poisoning. Credit: Shan et al. (2023).\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y Zhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\n\n\n\n14.4.3 Adversarial Attacks\nAdversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) (Parrish et al. 2023). By adding slight perturbations to input data, adversaries can “hack” a model’s pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction.\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot Text-to-Image Generation.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent Diffusion Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\nOne can generate prompts that lead to unsafe images in text-to-image models like DALLE (Ramesh et al. 2021) or Stable Diffusion (Rombach et al. 2022). For example, by altering the pixel values of an image, attackers can deceive a facial recognition system into identifying a face as a different person.\nAdversarial attacks exploit the way ML models learn and make decisions during inference. These models work on the principle of recognizing patterns in data. An adversary crafts special inputs with perturbations to mislead the model’s pattern recognition—essentially ‘hacking’ the model’s perceptions.\nAdversarial attacks fall under different scenarios:\n\nWhitebox Attacks: The attacker has full knowledge of the target model’s internal workings, including the training data,parameters, and architecture. This comprehensive access creates favorable conditions for exploiting the model’s vulnerabilities. The attacker can use specific and subtle weaknesses to craft effective adversarial examples.\nBlackbox Attacks: In contrast to whitebox attacks, in blackbox attacks, the attacker has little to no knowledge of the target model. To carry out the attack, the adversarial actor must carefully observe the model’s output behavior.\nGreybox Attacks: These fall between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model’s internal design. For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real world, practical attacks fall under black black-box box grey-boxes.\n\nThe landscape of machine learning models is complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, various adversarial attack methods have emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models:\n\nGenerative Adversarial Networks (GANs) are deep learning models consisting of two networks competing against each other: a generator and a discriminator (Goodfellow et al. 2020). The generator tries to synthesize realistic data while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that the target model misclassifies. These GAN-generated images can then attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.\nTransfer Learning Adversarial Attacks exploit the knowledge transferred from a pre-trained model to a target model, creating adversarial examples that can deceive both models. These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer responsible for making the final classifications). Referred to as”headless attacks,” these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used (Abdelkader et al. 2020).\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020. “Headless Horseman: Adversarial Attacks on Transfer Learning Models.” In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\nCase Study\nIn 2017, researchers conducted experiments by placing small black and white stickers on stop signs (Eykholt et al. 2017). When viewed by a normal human eye, the stickers did not obscure the sign or prevent interpretability. However, when images of the stickers stop signs were fed into standard traffic sign classification ML models, they were misclassified as speed limit signs over 85% of the time.\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017. “Robust Physical-World Attacks on Deep Learning Models.” ArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\nThis demonstration showed how simple adversarial stickers could trick ML systems into misreading critical road signs. If deployed in the real world, these attacks could endanger public safety, causing autonomous vehicles to misinterpret stop signs as speed limits. Researchers warned this could potentially cause dangerous rolling stops or acceleration into intersections.\nThis case study provides a concrete illustration of how adversarial examples exploit how ML models recognize patterns. By subtly manipulating the input data, attackers can induce incorrect predictions and create serious risks for safety-critical applications like self-driving cars. The attack’s simplicity shows how even minor changes imperceptible to humans can lead models astray. Developers need robust defenses against such threats.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1538,9 +1538,9 @@ "href": "contents/privacy_security/privacy_security.html#security-threats-to-ml-hardware", "title": "14  Security & Privacy", "section": "14.5 Security Threats to ML Hardware", - "text": "14.5 Security Threats to ML Hardware\nDiscussing the threats to embedded ML hardware security in a structured order is useful for a clear and in-depth understanding of the potential pitfalls of ML systems. We will begin with hardware bugs. We address the issues where intrinsic design flaws in the hardware can be a gateway to exploitation. This forms the fundamental knowledge required to understand the genesis of hardware vulnerabilities. Moving to physical attacks establishes the basic threat model, as these are the most overt and direct methods of compromising hardware integrity. Fault-injection attacks naturally extend this discussion, showing how specific manipulations can induce systematic failures.\nAdvancing to side-channel attacks next will show the increasing complexity, as these rely on exploiting indirect information leakages, requiring a nuanced understanding of hardware operations and environmental interactions. Leaky interfaces will show how external communication channels can become vulnerable, leading to accidental data exposures. Counterfeit hardware discussions benefit from prior explorations of hardware integrity and exploitation techniques, as they often compound these issues with additional risks due to their questionable provenance. Finally, supply chain risks encompass all concerns above and frame them within the context of the hardware’s journey from production to deployment, highlighting the multifaceted nature of hardware security and the need for vigilance at every stage.\nTable 14.1 overview table summarizing the topics:\n\n\n\nTable 14.1: Threat types on hardware security.\n\n\n\n\n\n\n\n\n\n\nThreat Type\nDescription\nRelevance to Embedded ML Hardware Security\n\n\n\n\nHardware Bugs\nIntrinsic flaws in hardware designs that can compromise system integrity.\nFoundation of hardware vulnerability.\n\n\nPhysical Attacks\nDirect exploitation of hardware through physical access or manipulation.\nBasic and overt threat model.\n\n\nFault-injection Attacks\nInduction of faults to cause errors in hardware operation, leading to potential system compromise.\nSystematic manipulation leading to failure.\n\n\nSide-Channel Attacks\nExploitation of leaked information from hardware operation to extract sensitive data.\nIndirect attack via environmental observation.\n\n\nLeaky Interfaces\nVulnerabilities arising from interfaces that expose data unintentionally.\nData exposure through communication channels.\n\n\nCounterfeit Hardware\nUse of unauthorized hardware components that may have security flaws.\nCompounded vulnerability issues.\n\n\nSupply Chain Risks\nRisks introduced through the hardware lifecycle, from production to deployment.\nCumulative and multifaceted security challenges.\n\n\n\n\n\n\n\n14.5.1 Hardware Bugs\nHardware is not immune to the pervasive issue of design flaws or bugs. Attackers can exploit these vulnerabilities to access, manipulate, or extract sensitive data, breaching the confidentiality and integrity that users and services depend on. An example of such vulnerabilities came to light with the discovery of Meltdown and Spectre—two hardware vulnerabilities that exploit critical vulnerabilities in modern processors. These bugs allow attackers to bypass the hardware barrier that separates applications, allowing a malicious program to read the memory of other programs and the operating system.\nMeltdown (Kocher et al. 2019a) and Spectre (Kocher et al. 2019b) work by taking advantage of optimizations in modern CPUs that allow them to speculatively execute instructions out of order before validity checks have been completed. This reveals data that should be inaccessible, which the attack captures through side channels like caches. The technical complexity demonstrates the difficulty of eliminating vulnerabilities even with extensive validation.\n\n———, et al. 2019a. “Spectre Attacks: Exploiting Speculative Execution.” In 2019 IEEE Symposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\nKocher, Paul, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, et al. 2019b. “Spectre Attacks: Exploiting Speculative Execution.” In 2019 IEEE Symposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\nIf an ML system is processing sensitive data, such as personal user information or proprietary business analytics, Meltdown and Spectre represent a real and present danger to data security. Consider the case of an ML accelerator card designed to speed up machine learning processes, such as the ones we discussed in the A.I. Hardware chapter. These accelerators work with the CPU to handle complex calculations, often related to data analytics, image recognition, and natural language processing. If such an accelerator card has a vulnerability akin to Meltdown or Spectre, it could leak the data it processes. An attacker could exploit this flaw not just to siphon off data but also to gain insights into the ML model’s workings, including potentially reverse-engineering the model itself (thus, going back to the issue of model theft.\nA real-world scenario where this could be devastating would be in the healthcare industry. ML systems routinely process highly sensitive patient data to help diagnose, plan treatment, and forecast outcomes. A bug in the system’s hardware could lead to the unauthorized disclosure of personal health information, violating patient privacy and contravening strict regulatory standards like the Health Insurance Portability and Accountability Act (HIPAA)\nThe Meltdown and Spectre vulnerabilities are stark reminders that hardware security is not just about preventing unauthorized physical access but also about ensuring that the hardware’s architecture does not become a conduit for data exposure. Similar hardware design flaws regularly emerge in CPUs, accelerators, memory, buses, and other components. This necessitates ongoing retroactive mitigations and performance tradeoffs in deployed systems. Proactive solutions like confidential computing architectures could mitigate entire classes of vulnerabilities through fundamentally more secure hardware design. Thwarting hardware bugs requires rigor at every design stage, validation, and deployment.\n\n\n14.5.2 Physical Attacks\nPhysical tampering refers to the direct, unauthorized manipulation of physical computing resources to undermine the integrity of machine learning systems. It’s a particularly insidious attack because it circumvents traditional cybersecurity measures, which often focus more on software vulnerabilities than hardware threats.\nPhysical tampering can take many forms, from the relatively simple, such as someone inserting a USB device loaded with malicious software into a server, to the highly sophisticated, such as embedding a hardware Trojan during the manufacturing process of a microchip (discussed later in greater detail in the Supply Chain section). ML systems are susceptible to this attack because they rely on the accuracy and integrity of their hardware to process and analyze vast amounts of data correctly.\nConsider an ML-powered drone used for geographical mapping. The drone’s operation relies on a series of onboard systems, including a navigation module that processes inputs from various sensors to determine its path. If an attacker gains physical access to this drone, they could replace the genuine navigation module with a compromised one that includes a backdoor. This manipulated module could then alter the drone’s flight path to conduct surveillance over restricted areas or even smuggle contraband by flying undetected routes.\nAnother example is the physical tampering of biometric scanners used for access control in secure facilities. By introducing a modified sensor that transmits biometric data to an unauthorized receiver, an attacker can access personal identification data to authenticate individuals.\nThere are several ways that physical tampering can occur in ML hardware:\n\nManipulating sensors: Consider an autonomous vehicle that relies on cameras and LiDAR for situational awareness. An attacker could carefully calibrate the physical alignment of these sensors to introduce blindspots or distort critical distances. This could impair object detection and endanger passengers.\nHardware trojans: Malicious circuit modifications can introduce trojans that activate under certain inputs. For example, an ML accelerator chip could function normally until a rare trigger case occurs, causing it to accelerate unsafely.\nTampering with memory: Physically exposing and manipulating memory chips could allow the extraction of encrypted ML model parameters. Fault injection techniques can also corrupt model data to degrade accuracy.\nIntroducing backdoors: Gaining physical access to servers, an adversary could use hardware keyloggers to capture passwords and create backdoor accounts for persistent access. These could then be used to exfiltrate ML training data over time.\nSupply chain attacks: Manipulating third-party hardware components or compromising manufacturing and shipping channels creates systemic vulnerabilities that are difficult to detect and remediate.\n\n\n\n14.5.3 Fault-injection Attacks\nBy intentionally introducing faults into ML hardware, attackers can induce errors in the computational process, leading to incorrect outputs. This manipulation compromises the integrity of ML operations and can serve as a vector for further exploitation, such as system reverse engineering or security protocol bypass. Fault injection involves intentionally disrupting normal computations in a system through external interference (Joye and Tunstall 2012). By precisely triggering computational errors, adversaries can alter program execution in ways that degrade reliability or leak sensitive information.\n\nJoye, Marc, and Michael Tunstall. 2012. Fault Analysis in Cryptography. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-29656-7.\n\nBarenghi, Alessandro, Guido M. Bertoni, Luca Breveglieri, Mauro Pellicioli, and Gerardo Pelosi. 2010. “Low Voltage Fault Attacks to AES.” In 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 7–12. IEEE; IEEE. https://doi.org/10.1109/hst.2010.5513121.\n\nHutter, Michael, Jorn-Marc Schmidt, and Thomas Plos. 2009. “Contact-Based Fault Injections and Power Analysis on RFID Tags.” In 2009 European Conference on Circuit Theory and Design, 409–12. IEEE; IEEE. https://doi.org/10.1109/ecctd.2009.5275012.\n\nAmiel, Frederic, Christophe Clavier, and Michael Tunstall. 2006. “Fault Analysis of DPA-Resistant Algorithms.” In International Workshop on Fault Diagnosis and Tolerance in Cryptography, 223–36. Springer.\n\nAgrawal, Dakshi, Selcuk Baktir, Deniz Karakoyunlu, Pankaj Rohatgi, and Berk Sunar. 2007. “Trojan Detection Using IC Fingerprinting.” In 2007 IEEE Symposium on Security and Privacy (SP ’07), 29–45. Springer; IEEE. https://doi.org/10.1109/sp.2007.36.\n\nSkorobogatov, Sergei. 2009. “Local Heating Attacks on Flash Memory Devices.” In 2009 IEEE International Workshop on Hardware-Oriented Security and Trust, 1–6. IEEE; IEEE. https://doi.org/10.1109/hst.2009.5225028.\n\nSkorobogatov, Sergei P, and Ross J Anderson. 2003. “Optical Fault Induction Attacks.” In Cryptographic Hardware and Embedded Systems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA, August 1315, 2002 Revised Papers 4, 2–12. Springer.\nVarious physical tampering techniques can be used for fault injection. Low voltage (Barenghi et al. 2010), power spikes (Hutter, Schmidt, and Plos 2009), clock glitches (Amiel, Clavier, and Tunstall 2006), electromagnetic pulses (Agrawal et al. 2007), temperate increase (S. Skorobogatov 2009) and laser strikes (S. P. Skorobogatov and Anderson 2003) are common hardware attack vectors. They are precisely timed to induce faults like flipped bits or skipped instructions during key operations.\nFor ML systems, consequences include impaired model accuracy, denial of service, extraction of private training data or model parameters, and reverse engineering of model architectures. Attackers could use fault injection to force misclassifications, disrupt autonomous systems, or steal intellectual property.\nFor example, in (Breier et al. 2018), the authors successfully injected a fault attack into a deep neural network deployed on a microcontroller. They used a laser to heat specific transistors, forcing them to switch states. In one instance, they used this method to attack a ReLU activation function, resulting in the function always outputting a value of 0, regardless of the input. In the assembly code in Figure 14.2, the attack caused the executing program to always skip the jmp end instruction on line 6. This means that HiddenLayerOutput[i] is always set to 0, overwriting any values written to it on lines 4 and 5. As a result, the targeted neurons are rendered inactive, resulting in misclassifications.\n\n\n\n\n\n\nFigure 14.2: Fault-injection demonstrated with assembly code. Credit: Breier et al. (2018).\n\n\nBreier, Jakub, Xiaolu Hou, Dirmanto Jap, Lei Ma, Shivam Bhasin, and Yang Liu. 2018. “Deeplaser: Practical Fault Attack on Deep Neural Networks.” ArXiv Preprint abs/1806.05859. https://arxiv.org/abs/1806.05859.\n\n\nAn attacker’s strategy could be to infer information about the activation functions using side-channel attacks (discussed next). Then, the attacker could attempt to target multiple activation function computations by randomly injecting faults into the layers as close to the output layer as possible, increasing the likelihood and impact of the attack.\nEmbedded devices are particularly vulnerable due to limited physical hardening and resource constraints that restrict robust runtime defenses. Without tamper-resistant packaging, attacker access to system buses and memory enables precise fault strikes. Lightweight embedded ML models also lack redundancy to overcome errors.\nThese attacks can be particularly insidious because they bypass traditional software-based security measures, often not accounting for physical disruptions. Furthermore, because ML systems rely heavily on the accuracy and reliability of their hardware for tasks like pattern recognition, decision-making, and automated responses, any compromise in their operation due to fault injection can have serious and wide-ranging consequences.\nMitigating fault injection risks necessitates a multilayer approach. Physical hardening through tamper-proof enclosures and design obfuscation helps reduce access. Lightweight anomaly detection can identify unusual sensor inputs or erroneous model outputs (Hsiao et al. 2023). Error-correcting memories minimize disruption, while data encryption safeguards information. Emerging model watermarking techniques trace stolen parameters.\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay Janapa Reddi. 2023. “MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.” In 2023 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\nHowever, balancing robust protections with embedded systems’ tight size and power limits remains challenging. Cryptography limits and lack of secure co-processors on cost-sensitive embedded hardware restrict options. Ultimately, fault injection resilience demands a cross-layer perspective spanning electrical, firmware, software, and physical design layers.\n\n\n14.5.4 Side-Channel Attacks\nSide-channel attacks are a category of security breach that depends on information gained from a computer system’s physical implementation. Unlike direct attacks on software or network vulnerabilities, side-channel attacks exploit a system’s hardware characteristics. These attacks can be particularly effective against complex machine learning systems, where large amounts of data are processed, and a high level of security is expected.\nThe fundamental premise of a side-channel attack is that a device’s operation can inadvertently leak information. Such leaks can come from various sources, including the electrical power a device consumes (Kocher, Jaffe, and Jun 1999), the electromagnetic fields it emits (Gandolfi, Mourtel, and Olivier 2001), the time it takes to process certain operations, or even the sounds it produces. Each channel can indirectly glimpse the system’s internal processes, revealing information that can compromise security.\n\nKocher, Paul, Joshua Jaffe, and Benjamin Jun. 1999. “Differential Power Analysis.” In Advances in CryptologyCRYPTO’99: 19th Annual International Cryptology Conference Santa Barbara, California, USA, August 1519, 1999 Proceedings 19, 388–97. Springer.\n\nGandolfi, Karine, Christophe Mourtel, and Francis Olivier. 2001. “Electromagnetic Analysis: Concrete Results.” In Cryptographic Hardware and Embedded SystemsCHES 2001: Third International Workshop Paris, France, May 1416, 2001 Proceedings 3, 251–61. Springer.\n\nKocher, Paul, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. 2011. “Introduction to Differential Power Analysis.” Journal of Cryptographic Engineering 1 (1): 5–27. https://doi.org/10.1007/s13389-011-0006-y.\nFor instance, consider a machine learning system performing encrypted transactions. Encryption algorithms are supposed to secure data but require computational work to encrypt and decrypt information. An attacker can analyze the power consumption patterns of the device performing encryption to figure out the cryptographic key. With sophisticated statistical methods, small variations in power usage during the encryption process can be correlated with the data being processed, eventually revealing the key. Some differential analysis attack techniques are Differential Power Analysis (DPA) (Kocher et al. 2011), Differential Electromagnetic Analysis (DEMA), and Correlation Power Analysis (CPA).\nFor example, consider an attacker trying to break the AES encryption algorithm using a differential analysis attack. The attacker would first need to collect many power or electromagnetic traces (a trace is a record of consumptions or emissions) of the device while performing AES encryption.\nOnce the attacker has collected sufficient traces, they would then use a statistical technique to identify correlations between the traces and the different values of the plaintext (original, unencrypted text) and ciphertext (encrypted text). These correlations would then be used to infer the value of a bit in the AES key and, eventually, the entire key. Differential analysis attacks are dangerous because they are low-cost, effective, and non-intrusive, allowing attackers to bypass algorithmic and hardware-level security measures. Compromises by these attacks are also hard to detect because they do not physically modify the device or break the encryption algorithm.\nBelow is a simplified visualization of how analyzing the power consumption patterns of the encryption device can help us extract information about the algorithm’s operations and, in turn, the secret data. Say we have a device that takes a 5-byte password as input. We will analyze and compare the different voltage patterns that are measured while the encryption device performs operations on the input to authenticate the password.\nFirst, consider the power analysis of the device’s operations after entering a correct password in the first picture in Figure 14.3. The dense blue graph outputs the encryption device’s voltage measurement. What matters here is the comparison between the different analysis charts rather than the specific details of what is going on in each scenario.\n\n\n\n\n\n\nFigure 14.3: Power analysis of an encryption device with a correct password. Credit: Colin O’Flynn.\n\n\n\nLet’s look at the power analysis chart when we enter an incorrect password in Figure 14.4. The first three bytes of the password are correct. As a result, we can see that the voltage patterns are very similar or identical between the two charts, up to and including the fourth byte. After the device processes the fourth byte, it determines a mismatch between the secret key and the attempted input. We notice a change in the pattern at the transition point between the fourth and fifth bytes: the voltage has gone up (the current has gone down) because the device has stopped processing the rest of the input.\n\n\n\n\n\n\nFigure 14.4: Power analysis of an encryption device with a (partially) wrong password. Credit: Colin O’Flynn.\n\n\n\nFigure 14.5 describes another chart of a completely wrong password. After the device finishes processing the first byte, it determines that it is incorrect and stops further processing - the voltage goes up and the current down.\n\n\n\n\n\n\nFigure 14.5: Power analysis of an encryption device with a wrong password. Credit: Colin O’Flynn.\n\n\n\nThe example above shows how we can infer information about the encryption process and the secret key by analyzing different inputs and trying to ‘eavesdrop’ on the device’s operations on each input byte.\nFor a more detailed explanation, watch the video below.\n\nAnother example is an ML system for speech recognition, which processes voice commands to perform actions. By measuring the time it takes for the system to respond to commands or the power used during processing, an attacker could infer what commands are being processed and thus learn about the system’s operational patterns. Even more subtle, the sound emitted by a computer’s fan or hard drive could change in response to the workload, which a sensitive microphone could pick up and analyze to determine what kind of operations are being performed.\nIn real-world scenarios, side-channel attacks have been used to extract encryption keys and compromise secure communications. One of the earliest recorded side-channel attacks dates back to the 1960s when British intelligence agency MI5 faced the challenge of deciphering encrypted communications from the Egyptian Embassy in London. Their cipher-breaking attempts were thwarted by the computational limitations of the time until an ingenious observation changed the game.\nMI5 agent Peter Wright proposed using a microphone to capture the subtle acoustic signatures emitted from the embassy’s rotor cipher machine during encryption (Burnet and Thomas 1989). The distinct mechanical clicks of the rotors as operators configured them daily leaked critical information about the initial settings. This simple side channel of sound enabled MI5 to dramatically reduce the complexity of deciphering messages. This early acoustic leak attack highlights that side-channel attacks are not merely a digital age novelty but a continuation of age-old cryptanalytic principles. The notion that where there is a signal, there is an opportunity for interception remains foundational. From mechanical clicks to electrical fluctuations and beyond, side channels enable adversaries to extract secrets indirectly through careful signal analysis.\n\nBurnet, David, and Richard Thomas. 1989. “Spycatcher: The Commodification of Truth.” J. Law Soc. 16 (2): 210. https://doi.org/10.2307/1410360.\n\nAsonov, D., and R. Agrawal. 2004. “Keyboard Acoustic Emanations.” In IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004, 3–11. IEEE; IEEE. https://doi.org/10.1109/secpri.2004.1301311.\n\nGnad, Dennis R. E., Fabian Oboril, and Mehdi B. Tahoori. 2017. “Voltage Drop-Based Fault Attacks on FPGAs Using Valid Bitstreams.” In 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 1–7. IEEE; IEEE. https://doi.org/10.23919/fpl.2017.8056840.\n\nZhao, Mark, and G. Edward Suh. 2018. “FPGA-Based Remote Power Side-Channel Attacks.” In 2018 IEEE Symposium on Security and Privacy (SP), 229–44. IEEE; IEEE. https://doi.org/10.1109/sp.2018.00049.\nToday, acoustic cryptanalysis has evolved into attacks like keyboard eavesdropping (Asonov and Agrawal 2004). Electrical side channels range from power analysis on cryptographic hardware (Gnad, Oboril, and Tahoori 2017) to voltage fluctuations (Zhao and Suh 2018) on machine learning accelerators. Timing, electromagnetic emission, and even heat footprints can likewise be exploited. New and unexpected side channels often emerge as computing becomes more interconnected and miniaturized.\nJust as MI5’s analog acoustic leak transformed their codebreaking, modern side-channel attacks circumvent traditional boundaries of cyber defense. Understanding the creative spirit and historical persistence of side channel exploits is key knowledge for developers and defenders seeking to secure modern machine learning systems comprehensively against digital and physical threats.\n\n\n14.5.5 Leaky Interfaces\nLeaky interfaces in embedded systems are often overlooked backdoors that can become significant security vulnerabilities. While designed for legitimate purposes such as communication, maintenance, or debugging, these interfaces may inadvertently provide attackers with a window through which they can extract sensitive information or inject malicious data.\nAn interface becomes “leaky” when it exposes more information than it should, often due to a lack of stringent access controls or inadequate shielding of the transmitted data. Here are some real-world examples of leaky interface issues causing security problems in IoT and embedded devices:\n\nBaby Monitors: Many WiFi-enabled baby monitors have been found to have unsecured interfaces for remote access. This allowed attackers to gain live audio and video feeds from people’s homes, representing a major privacy violation.\nPacemakers: Interface vulnerabilities were discovered in some pacemakers that could allow attackers to manipulate cardiac functions if exploited. This presents a potentially life-threatening scenario.\nSmart Lightbulbs: A researcher found he could access unencrypted data from smart lightbulbs via a debug interface, including WiFi credentials, allowing him to gain access to the connected network (Greengard 2015).\nSmart Cars: If left unsecured, The OBD-II diagnostic port has been shown to provide an attack vector into automotive systems. Researchers could use it to control brakes and other components (Miller and Valasek 2015).\n\n\nGreengard, Samuel. 2015. The Internet of Things. The MIT Press. https://doi.org/10.7551/mitpress/10277.001.0001.\n\nMiller, Charlie, and Chris Valasek. 2015. “Remote Exploitation of an Unaltered Passenger Vehicle.” Black Hat USA 2015 (S 91): 1–91.\nWhile the above are not directly connected with ML, consider the example of a smart home system with an embedded ML component that controls home security based on behavior patterns it learns over time. The system includes a maintenance interface accessible via the local network for software updates and system checks. If this interface does not require strong authentication or the data transmitted through it is not encrypted, an attacker on the same network could gain access. They could then eavesdrop on the homeowner’s daily routines or reprogram the security settings by manipulating the firmware.\nSuch leaks are a privacy issue and a potential entry point for more damaging exploits. The exposure of training data, model parameters, or ML outputs from a leak could help adversaries construct adversarial examples or reverse-engineer models. Access through a leaky interface could also be used to alter an embedded device’s firmware, loading it with malicious code that could turn off the device, intercept data, or use it in botnet attacks.\nTo mitigate these risks, a multi-layered approach is necessary, spanning technical controls like authentication, encryption, anomaly detection, policies and processes like interface inventories, access controls, auditing, and secure development practices. Turning off unnecessary interfaces and compartmentalizing risks via a zero-trust model provide additional protection.\nAs designers of embedded ML systems, we should assess interfaces early in development and continually monitor them post-deployment as part of an end-to-end security lifecycle. Understanding and securing interfaces is crucial for ensuring the overall security of embedded ML.\n\n\n14.5.6 Counterfeit Hardware\nML systems are only as reliable as the underlying hardware. In an era where hardware components are global commodities, the rise of counterfeit or cloned hardware presents a significant challenge. Counterfeit hardware encompasses any components that are unauthorized reproductions of original parts. Counterfeit components infiltrate ML systems through complex supply chains that stretch across borders and involve numerous stages from manufacture to delivery.\nA single lapse in the supply chain’s integrity can result in the insertion of counterfeit parts designed to closely imitate the functions and appearance of genuine hardware. For instance, a facial recognition system for high-security access control may be compromised if equipped with counterfeit processors. These processors could fail to accurately process and verify biometric data, potentially allowing unauthorized individuals to access restricted areas.\nThe challenge with counterfeit hardware is multifaceted. It undermines the quality and reliability of ML systems, as these components may degrade faster or perform unpredictably due to substandard manufacturing. The security risks are also profound; counterfeit hardware can contain vulnerabilities ripe for exploitation by malicious actors. For example, a cloned network router in an ML data center might include a hidden backdoor, enabling data interception or network intrusion without detection.\nFurthermore, counterfeit hardware poses legal and compliance risks. Companies inadvertently utilizing counterfeit parts in their ML systems may face serious legal repercussions, including fines and sanctions for failing to comply with industry regulations and standards. This is particularly true for sectors where compliance with specific safety and privacy regulations is mandatory, such as healthcare and finance.\nThe issue of counterfeit hardware is exacerbated by economic pressures to reduce costs, which can compel businesses to source from lower-cost suppliers without stringent verification processes. This economizing can inadvertently introduce counterfeit parts into otherwise secure systems. Additionally, detecting these counterfeits is inherently difficult since they are created to pass as the original components, often requiring sophisticated equipment and expertise to identify.\nIn ML, where decisions are made in real time and based on complex computations, the consequences of hardware failure are inconvenient and potentially dangerous. Stakeholders in the field of ML need to understand these risks thoroughly. The issues presented by counterfeit hardware necessitate a deep dive into the current challenges facing ML system integrity and emphasize the importance of vigilant, informed management of the hardware life cycle within these advanced systems.\n\n\n14.5.7 Supply Chain Risks\nThe threat of counterfeit hardware is closely tied to broader supply chain vulnerabilities. Globalized, interconnected supply chains create multiple opportunities for compromised components to infiltrate a product’s lifecycle. Supply chains involve numerous entities, from design to manufacturing, assembly, distribution, and integration. A lack of transparency and oversight of each partner makes verifying integrity at every step challenging. Lapses anywhere along the chain can allow the insertion of counterfeit parts.\nFor example, a contracted manufacturer may unknowingly receive and incorporate recycled electronic waste containing dangerous counterfeits. An untrustworthy distributor could smuggle in cloned components. Insider threats at any vendor might deliberately mix counterfeits into legitimate shipments.\nOnce counterfeits enter the supply stream, they move quickly through multiple hands before ending up in ML systems where detection is difficult. Advanced counterfeits like refurbished parts or clones with repackaged externals can masquerade as authentic components, passing visual inspection.\nTo identify fakes, thorough technical profiling using micrography, X-ray screening, component forensics, and functional testing is often required. However, such costly analysis is impractical for large-volume procurement.\nStrategies like supply chain audits, screening suppliers, validating component provenance, and adding tamper-evident protections can help mitigate risks. However, given global supply chain security challenges, a zero-trust approach is prudent. Designing ML systems to utilize redundant checking, fail-safes, and continuous runtime monitoring provides resilience against component compromises.\nRigorous validation of hardware sources coupled with fault-tolerant system architectures offers the most robust defense against the pervasive risks of convoluted, opaque global supply chains.\n\nCase Study\nIn 2018, Bloomberg Businessweek published an alarming story that got much attention in the tech world. The article claimed that Supermicro had secretly planted tiny spy chips on server hardware. Reporters said Chinese state hackers working with Supermicro could sneak these tiny chips onto motherboards during manufacturing. The tiny chips allegedly gave the hackers backdoor access to servers used by over 30 major companies, including Apple and Amazon.\nIf true, this would allow hackers to spy on private data or even tamper with systems. However, after investigating, Apple and Amazon found no proof that such hacked Supermicro hardware existed. Other experts questioned whether the Bloomberg article was accurate reporting.\nWhether the story is completely true or not is not our concern from a pedagogical viewpoint. However, this incident drew attention to the risks of global supply chains for hardware, especially manufactured in China. When companies outsource and buy hardware components from vendors worldwide, there needs to be more visibility into the process. In this complex global pipeline, there are concerns that counterfeits or tampered hardware could be slipped in somewhere along the way without tech companies realizing it. Companies relying too much on single manufacturers or distributors creates risk. For instance, due to the over-reliance on TSMC for semiconductor manufacturing, the U.S. has invested 50 billion dollars into the CHIPS Act.\nAs ML moves into more critical systems, verifying hardware integrity from design through production and delivery is crucial. The reported Supermicro backdoor demonstrated that for ML security, we cannot take global supply chains and manufacturing for granted. We must inspect and validate hardware at every link in the chain.", + "text": "14.5 Security Threats to ML Hardware\nDiscussing the threats to embedded ML hardware security in a structured order is useful for a clear and in-depth understanding of the potential pitfalls of ML systems. We will begin with hardware bugs. We address the issues where intrinsic design flaws in the hardware can be a gateway to exploitation. This forms the fundamental knowledge required to understand the genesis of hardware vulnerabilities. Moving to physical attacks establishes the basic threat model, as these are the most overt and direct methods of compromising hardware integrity. Fault-injection attacks naturally extend this discussion, showing how specific manipulations can induce systematic failures.\nAdvancing to side-channel attacks next will show the increasing complexity, as these rely on exploiting indirect information leakages, requiring a nuanced understanding of hardware operations and environmental interactions. Leaky interfaces will show how external communication channels can become vulnerable, leading to accidental data exposures. Counterfeit hardware discussions benefit from prior explorations of hardware integrity and exploitation techniques, as they often compound these issues with additional risks due to their questionable provenance. Finally, supply chain risks encompass all concerns above and frame them within the context of the hardware’s journey from production to deployment, highlighting the multifaceted nature of hardware security and the need for vigilance at every stage.\nTable 14.1 overview table summarizing the topics:\n\n\n\nTable 14.1: Threat types on hardware security.\n\n\n\n\n\n\n\n\n\n\nThreat Type\nDescription\nRelevance to Embedded ML Hardware Security\n\n\n\n\nHardware Bugs\nIntrinsic flaws in hardware designs that can compromise system integrity.\nFoundation of hardware vulnerability.\n\n\nPhysical Attacks\nDirect exploitation of hardware through physical access or manipulation.\nBasic and overt threat model.\n\n\nFault-injection Attacks\nInduction of faults to cause errors in hardware operation, leading to potential system compromise.\nSystematic manipulation leading to failure.\n\n\nSide-Channel Attacks\nExploitation of leaked information from hardware operation to extract sensitive data.\nIndirect attack via environmental observation.\n\n\nLeaky Interfaces\nVulnerabilities arising from interfaces that expose data unintentionally.\nData exposure through communication channels.\n\n\nCounterfeit Hardware\nUse of unauthorized hardware components that may have security flaws.\nCompounded vulnerability issues.\n\n\nSupply Chain Risks\nRisks introduced through the hardware lifecycle, from production to deployment.\nCumulative and multifaceted security challenges.\n\n\n\n\n\n\n\n14.5.1 Hardware Bugs\nHardware is not immune to the pervasive issue of design flaws or bugs. Attackers can exploit these vulnerabilities to access, manipulate, or extract sensitive data, breaching the confidentiality and integrity that users and services depend on. An example of such vulnerabilities came to light with the discovery of Meltdown and Spectre—two hardware vulnerabilities that exploit critical vulnerabilities in modern processors. These bugs allow attackers to bypass the hardware barrier that separates applications, allowing a malicious program to read the memory of other programs and the operating system.\nMeltdown (Kocher et al. 2019a) and Spectre (Kocher et al. 2019b) work by taking advantage of optimizations in modern CPUs that allow them to speculatively execute instructions out of order before validity checks have been completed. This reveals data that should be inaccessible, which the attack captures through side channels like caches. The technical complexity demonstrates the difficulty of eliminating vulnerabilities even with extensive validation.\n\n———, et al. 2019a. “Spectre Attacks: Exploiting Speculative Execution.” In 2019 IEEE Symposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\nKocher, Paul, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, et al. 2019b. “Spectre Attacks: Exploiting Speculative Execution.” In 2019 IEEE Symposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\nIf an ML system is processing sensitive data, such as personal user information or proprietary business analytics, Meltdown and Spectre represent a real and present danger to data security. Consider the case of an ML accelerator card designed to speed up machine learning processes, such as the ones we discussed in the A.I. Hardware chapter. These accelerators work with the CPU to handle complex calculations, often related to data analytics, image recognition, and natural language processing. If such an accelerator card has a vulnerability akin to Meltdown or Spectre, it could leak the data it processes. An attacker could exploit this flaw not just to siphon off data but also to gain insights into the ML model’s workings, including potentially reverse-engineering the model itself (thus, going back to the issue of model theft.\nA real-world scenario where this could be devastating would be in the healthcare industry. ML systems routinely process highly sensitive patient data to help diagnose, plan treatment, and forecast outcomes. A bug in the system’s hardware could lead to the unauthorized disclosure of personal health information, violating patient privacy and contravening strict regulatory standards like the Health Insurance Portability and Accountability Act (HIPAA)\nThe Meltdown and Spectre vulnerabilities are stark reminders that hardware security is not just about preventing unauthorized physical access but also about ensuring that the hardware’s architecture does not become a conduit for data exposure. Similar hardware design flaws regularly emerge in CPUs, accelerators, memory, buses, and other components. This necessitates ongoing retroactive mitigations and performance tradeoffs in deployed systems. Proactive solutions like confidential computing architectures could mitigate entire classes of vulnerabilities through fundamentally more secure hardware design. Thwarting hardware bugs requires rigor at every design stage, validation, and deployment.\n\n\n14.5.2 Physical Attacks\nPhysical tampering refers to the direct, unauthorized manipulation of physical computing resources to undermine the integrity of machine learning systems. It’s a particularly insidious attack because it circumvents traditional cybersecurity measures, which often focus more on software vulnerabilities than hardware threats.\nPhysical tampering can take many forms, from the relatively simple, such as someone inserting a USB device loaded with malicious software into a server, to the highly sophisticated, such as embedding a hardware Trojan during the manufacturing process of a microchip (discussed later in greater detail in the Supply Chain section). ML systems are susceptible to this attack because they rely on the accuracy and integrity of their hardware to process and analyze vast amounts of data correctly.\nConsider an ML-powered drone used for geographical mapping. The drone’s operation relies on a series of onboard systems, including a navigation module that processes inputs from various sensors to determine its path. If an attacker gains physical access to this drone, they could replace the genuine navigation module with a compromised one that includes a backdoor. This manipulated module could then alter the drone’s flight path to conduct surveillance over restricted areas or even smuggle contraband by flying undetected routes.\nAnother example is the physical tampering of biometric scanners used for access control in secure facilities. By introducing a modified sensor that transmits biometric data to an unauthorized receiver, an attacker can access personal identification data to authenticate individuals.\nThere are several ways that physical tampering can occur in ML hardware:\n\nManipulating sensors: Consider an autonomous vehicle that relies on cameras and LiDAR for situational awareness. An attacker could carefully calibrate the physical alignment of these sensors to introduce blindspots or distort critical distances. This could impair object detection and endanger passengers.\nHardware trojans: Malicious circuit modifications can introduce trojans that activate under certain inputs. For example, an ML accelerator chip could function normally until a rare trigger case occurs, causing it to accelerate unsafely.\nTampering with memory: Physically exposing and manipulating memory chips could allow the extraction of encrypted ML model parameters. Fault injection techniques can also corrupt model data to degrade accuracy.\nIntroducing backdoors: Gaining physical access to servers, an adversary could use hardware keyloggers to capture passwords and create backdoor accounts for persistent access. These could then be used to exfiltrate ML training data over time.\nSupply chain attacks: Manipulating third-party hardware components or compromising manufacturing and shipping channels creates systemic vulnerabilities that are difficult to detect and remediate.\n\n\n\n14.5.3 Fault-injection Attacks\nBy intentionally introducing faults into ML hardware, attackers can induce errors in the computational process, leading to incorrect outputs. This manipulation compromises the integrity of ML operations and can serve as a vector for further exploitation, such as system reverse engineering or security protocol bypass. Fault injection involves intentionally disrupting normal computations in a system through external interference (Joye and Tunstall 2012). By precisely triggering computational errors, adversaries can alter program execution in ways that degrade reliability or leak sensitive information.\n\nJoye, Marc, and Michael Tunstall. 2012. Fault Analysis in Cryptography. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-29656-7.\n\nBarenghi, Alessandro, Guido M. Bertoni, Luca Breveglieri, Mauro Pellicioli, and Gerardo Pelosi. 2010. “Low Voltage Fault Attacks to AES.” In 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 7–12. IEEE; IEEE. https://doi.org/10.1109/hst.2010.5513121.\n\nHutter, Michael, Jorn-Marc Schmidt, and Thomas Plos. 2009. “Contact-Based Fault Injections and Power Analysis on RFID Tags.” In 2009 European Conference on Circuit Theory and Design, 409–12. IEEE; IEEE. https://doi.org/10.1109/ecctd.2009.5275012.\n\nAmiel, Frederic, Christophe Clavier, and Michael Tunstall. 2006. “Fault Analysis of DPA-Resistant Algorithms.” In International Workshop on Fault Diagnosis and Tolerance in Cryptography, 223–36. Springer.\n\nAgrawal, Dakshi, Selcuk Baktir, Deniz Karakoyunlu, Pankaj Rohatgi, and Berk Sunar. 2007. “Trojan Detection Using IC Fingerprinting.” In 2007 IEEE Symposium on Security and Privacy (SP ’07), 29–45. Springer; IEEE. https://doi.org/10.1109/sp.2007.36.\n\nSkorobogatov, Sergei. 2009. “Local Heating Attacks on Flash Memory Devices.” In 2009 IEEE International Workshop on Hardware-Oriented Security and Trust, 1–6. IEEE; IEEE. https://doi.org/10.1109/hst.2009.5225028.\n\nSkorobogatov, Sergei P, and Ross J Anderson. 2003. “Optical Fault Induction Attacks.” In Cryptographic Hardware and Embedded Systems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA, August 1315, 2002 Revised Papers 4, 2–12. Springer.\nVarious physical tampering techniques can be used for fault injection. Low voltage (Barenghi et al. 2010), power spikes (Hutter, Schmidt, and Plos 2009), clock glitches (Amiel, Clavier, and Tunstall 2006), electromagnetic pulses (Agrawal et al. 2007), temperate increase (S. Skorobogatov 2009) and laser strikes (S. P. Skorobogatov and Anderson 2003) are common hardware attack vectors. They are precisely timed to induce faults like flipped bits or skipped instructions during key operations.\nFor ML systems, consequences include impaired model accuracy, denial of service, extraction of private training data or model parameters, and reverse engineering of model architectures. Attackers could use fault injection to force misclassifications, disrupt autonomous systems, or steal intellectual property.\nFor example, in (Breier et al. 2018), the authors successfully injected a fault attack into a deep neural network deployed on a microcontroller. They used a laser to heat specific transistors, forcing them to switch states. In one instance, they used this method to attack a ReLU activation function, resulting in the function always outputting a value of 0, regardless of the input. In the assembly code in Figure 14.2, the attack caused the executing program to always skip the jmp end instruction on line 6. This means that HiddenLayerOutput[i] is always set to 0, overwriting any values written to it on lines 4 and 5. As a result, the targeted neurons are rendered inactive, resulting in misclassifications.\n\n\n\n\n\n\nFigure 14.2: Fault-injection demonstrated with assembly code. Credit: Breier et al. (2018).\n\n\nBreier, Jakub, Xiaolu Hou, Dirmanto Jap, Lei Ma, Shivam Bhasin, and Yang Liu. 2018. “Deeplaser: Practical Fault Attack on Deep Neural Networks.” ArXiv Preprint abs/1806.05859. https://arxiv.org/abs/1806.05859.\n\n\nAn attacker’s strategy could be to infer information about the activation functions using side-channel attacks (discussed next). Then, the attacker could attempt to target multiple activation function computations by randomly injecting faults into the layers as close to the output layer as possible, increasing the likelihood and impact of the attack.\nEmbedded devices are particularly vulnerable due to limited physical hardening and resource constraints that restrict robust runtime defenses. Without tamper-resistant packaging, attacker access to system buses and memory enables precise fault strikes. Lightweight embedded ML models also lack redundancy to overcome errors.\nThese attacks can be particularly insidious because they bypass traditional software-based security measures, often not accounting for physical disruptions. Furthermore, because ML systems rely heavily on the accuracy and reliability of their hardware for tasks like pattern recognition, decision-making, and automated responses, any compromise in their operation due to fault injection can have serious and wide-ranging consequences.\nMitigating fault injection risks necessitates a multilayer approach. Physical hardening through tamper-proof enclosures and design obfuscation helps reduce access. Lightweight anomaly detection can identify unusual sensor inputs or erroneous model outputs (Hsiao et al. 2023). Error-correcting memories minimize disruption, while data encryption safeguards information. Emerging model watermarking techniques trace stolen parameters.\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay Janapa Reddi. 2023. “MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.” In 2023 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\nHowever, balancing robust protections with embedded systems’ tight size and power limits remains challenging. Cryptography limits and lack of secure co-processors on cost-sensitive embedded hardware restrict options. Ultimately, fault injection resilience demands a cross-layer perspective spanning electrical, firmware, software, and physical design layers.\n\n\n14.5.4 Side-Channel Attacks\nSide-channel attacks are a category of security breach that depends on information gained from a computer system’s physical implementation. Unlike direct attacks on software or network vulnerabilities, side-channel attacks exploit a system’s hardware characteristics. These attacks can be particularly effective against complex machine learning systems, where large amounts of data are processed, and a high level of security is expected.\nThe fundamental premise of a side-channel attack is that a device’s operation can inadvertently leak information. Such leaks can come from various sources, including the electrical power a device consumes (Kocher, Jaffe, and Jun 1999), the electromagnetic fields it emits (Gandolfi, Mourtel, and Olivier 2001), the time it takes to process certain operations, or even the sounds it produces. Each channel can indirectly glimpse the system’s internal processes, revealing information that can compromise security.\n\nKocher, Paul, Joshua Jaffe, and Benjamin Jun. 1999. “Differential Power Analysis.” In Advances in CryptologyCRYPTO’99: 19th Annual International Cryptology Conference Santa Barbara, California, USA, August 1519, 1999 Proceedings 19, 388–97. Springer.\n\nGandolfi, Karine, Christophe Mourtel, and Francis Olivier. 2001. “Electromagnetic Analysis: Concrete Results.” In Cryptographic Hardware and Embedded SystemsCHES 2001: Third International Workshop Paris, France, May 1416, 2001 Proceedings 3, 251–61. Springer.\n\nKocher, Paul, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. 2011. “Introduction to Differential Power Analysis.” Journal of Cryptographic Engineering 1 (1): 5–27. https://doi.org/10.1007/s13389-011-0006-y.\nFor instance, consider a machine learning system performing encrypted transactions. Encryption algorithms are supposed to secure data but require computational work to encrypt and decrypt information. An attacker can analyze the power consumption patterns of the device performing encryption to figure out the cryptographic key. With sophisticated statistical methods, small variations in power usage during the encryption process can be correlated with the data being processed, eventually revealing the key. Some differential analysis attack techniques are Differential Power Analysis (DPA) (Kocher et al. 2011), Differential Electromagnetic Analysis (DEMA), and Correlation Power Analysis (CPA).\nFor example, consider an attacker trying to break the AES encryption algorithm using a differential analysis attack. The attacker would first need to collect many power or electromagnetic traces (a trace is a record of consumptions or emissions) of the device while performing AES encryption.\nOnce the attacker has collected sufficient traces, they would then use a statistical technique to identify correlations between the traces and the different values of the plaintext (original, unencrypted text) and ciphertext (encrypted text). These correlations would then be used to infer the value of a bit in the AES key and, eventually, the entire key. Differential analysis attacks are dangerous because they are low-cost, effective, and non-intrusive, allowing attackers to bypass algorithmic and hardware-level security measures. Compromises by these attacks are also hard to detect because they do not physically modify the device or break the encryption algorithm.\nBelow is a simplified visualization of how analyzing the power consumption patterns of the encryption device can help us extract information about the algorithm’s operations and, in turn, the secret data. Say we have a device that takes a 5-byte password as input. We will analyze and compare the different voltage patterns that are measured while the encryption device performs operations on the input to authenticate the password.\nFirst, consider the power analysis of the device’s operations after entering a correct password in the first picture in Figure 14.3. The dense blue graph outputs the encryption device’s voltage measurement. What matters here is the comparison between the different analysis charts rather than the specific details of what is going on in each scenario.\n\n\n\n\n\n\nFigure 14.3: Power analysis of an encryption device with a correct password. Credit: Colin O’Flynn.\n\n\n\nLet’s look at the power analysis chart when we enter an incorrect password in Figure 14.4. The first three bytes of the password are correct. As a result, we can see that the voltage patterns are very similar or identical between the two charts, up to and including the fourth byte. After the device processes the fourth byte, it determines a mismatch between the secret key and the attempted input. We notice a change in the pattern at the transition point between the fourth and fifth bytes: the voltage has gone up (the current has gone down) because the device has stopped processing the rest of the input.\n\n\n\n\n\n\nFigure 14.4: Power analysis of an encryption device with a (partially) wrong password. Credit: Colin O’Flynn.\n\n\n\nFigure 14.5 describes another chart of a completely wrong password. After the device finishes processing the first byte, it determines that it is incorrect and stops further processing - the voltage goes up and the current down.\n\n\n\n\n\n\nFigure 14.5: Power analysis of an encryption device with a wrong password. Credit: Colin O’Flynn.\n\n\n\nThe example above shows how we can infer information about the encryption process and the secret key by analyzing different inputs and trying to ‘eavesdrop’ on the device’s operations on each input byte. For a more detailed explanation, watch Video 14.3 below.\n\n\n\n\n\n\nVideo 14.3: Power Attack\n\n\n\n\n\n\nAnother example is an ML system for speech recognition, which processes voice commands to perform actions. By measuring the time it takes for the system to respond to commands or the power used during processing, an attacker could infer what commands are being processed and thus learn about the system’s operational patterns. Even more subtle, the sound emitted by a computer’s fan or hard drive could change in response to the workload, which a sensitive microphone could pick up and analyze to determine what kind of operations are being performed.\nIn real-world scenarios, side-channel attacks have been used to extract encryption keys and compromise secure communications. One of the earliest recorded side-channel attacks dates back to the 1960s when British intelligence agency MI5 faced the challenge of deciphering encrypted communications from the Egyptian Embassy in London. Their cipher-breaking attempts were thwarted by the computational limitations of the time until an ingenious observation changed the game.\nMI5 agent Peter Wright proposed using a microphone to capture the subtle acoustic signatures emitted from the embassy’s rotor cipher machine during encryption (Burnet and Thomas 1989). The distinct mechanical clicks of the rotors as operators configured them daily leaked critical information about the initial settings. This simple side channel of sound enabled MI5 to dramatically reduce the complexity of deciphering messages. This early acoustic leak attack highlights that side-channel attacks are not merely a digital age novelty but a continuation of age-old cryptanalytic principles. The notion that where there is a signal, there is an opportunity for interception remains foundational. From mechanical clicks to electrical fluctuations and beyond, side channels enable adversaries to extract secrets indirectly through careful signal analysis.\n\nBurnet, David, and Richard Thomas. 1989. “Spycatcher: The Commodification of Truth.” J. Law Soc. 16 (2): 210. https://doi.org/10.2307/1410360.\n\nAsonov, D., and R. Agrawal. 2004. “Keyboard Acoustic Emanations.” In IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004, 3–11. IEEE; IEEE. https://doi.org/10.1109/secpri.2004.1301311.\n\nGnad, Dennis R. E., Fabian Oboril, and Mehdi B. Tahoori. 2017. “Voltage Drop-Based Fault Attacks on FPGAs Using Valid Bitstreams.” In 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 1–7. IEEE; IEEE. https://doi.org/10.23919/fpl.2017.8056840.\n\nZhao, Mark, and G. Edward Suh. 2018. “FPGA-Based Remote Power Side-Channel Attacks.” In 2018 IEEE Symposium on Security and Privacy (SP), 229–44. IEEE; IEEE. https://doi.org/10.1109/sp.2018.00049.\nToday, acoustic cryptanalysis has evolved into attacks like keyboard eavesdropping (Asonov and Agrawal 2004). Electrical side channels range from power analysis on cryptographic hardware (Gnad, Oboril, and Tahoori 2017) to voltage fluctuations (Zhao and Suh 2018) on machine learning accelerators. Timing, electromagnetic emission, and even heat footprints can likewise be exploited. New and unexpected side channels often emerge as computing becomes more interconnected and miniaturized.\nJust as MI5’s analog acoustic leak transformed their codebreaking, modern side-channel attacks circumvent traditional boundaries of cyber defense. Understanding the creative spirit and historical persistence of side channel exploits is key knowledge for developers and defenders seeking to secure modern machine learning systems comprehensively against digital and physical threats.\n\n\n14.5.5 Leaky Interfaces\nLeaky interfaces in embedded systems are often overlooked backdoors that can become significant security vulnerabilities. While designed for legitimate purposes such as communication, maintenance, or debugging, these interfaces may inadvertently provide attackers with a window through which they can extract sensitive information or inject malicious data.\nAn interface becomes “leaky” when it exposes more information than it should, often due to a lack of stringent access controls or inadequate shielding of the transmitted data. Here are some real-world examples of leaky interface issues causing security problems in IoT and embedded devices:\n\nBaby Monitors: Many WiFi-enabled baby monitors have been found to have unsecured interfaces for remote access. This allowed attackers to gain live audio and video feeds from people’s homes, representing a major privacy violation.\nPacemakers: Interface vulnerabilities were discovered in some pacemakers that could allow attackers to manipulate cardiac functions if exploited. This presents a potentially life-threatening scenario.\nSmart Lightbulbs: A researcher found he could access unencrypted data from smart lightbulbs via a debug interface, including WiFi credentials, allowing him to gain access to the connected network (Greengard 2015).\nSmart Cars: If left unsecured, The OBD-II diagnostic port has been shown to provide an attack vector into automotive systems. Researchers could use it to control brakes and other components (Miller and Valasek 2015).\n\n\nGreengard, Samuel. 2015. The Internet of Things. The MIT Press. https://doi.org/10.7551/mitpress/10277.001.0001.\n\nMiller, Charlie, and Chris Valasek. 2015. “Remote Exploitation of an Unaltered Passenger Vehicle.” Black Hat USA 2015 (S 91): 1–91.\nWhile the above are not directly connected with ML, consider the example of a smart home system with an embedded ML component that controls home security based on behavior patterns it learns over time. The system includes a maintenance interface accessible via the local network for software updates and system checks. If this interface does not require strong authentication or the data transmitted through it is not encrypted, an attacker on the same network could gain access. They could then eavesdrop on the homeowner’s daily routines or reprogram the security settings by manipulating the firmware.\nSuch leaks are a privacy issue and a potential entry point for more damaging exploits. The exposure of training data, model parameters, or ML outputs from a leak could help adversaries construct adversarial examples or reverse-engineer models. Access through a leaky interface could also be used to alter an embedded device’s firmware, loading it with malicious code that could turn off the device, intercept data, or use it in botnet attacks.\nTo mitigate these risks, a multi-layered approach is necessary, spanning technical controls like authentication, encryption, anomaly detection, policies and processes like interface inventories, access controls, auditing, and secure development practices. Turning off unnecessary interfaces and compartmentalizing risks via a zero-trust model provide additional protection.\nAs designers of embedded ML systems, we should assess interfaces early in development and continually monitor them post-deployment as part of an end-to-end security lifecycle. Understanding and securing interfaces is crucial for ensuring the overall security of embedded ML.\n\n\n14.5.6 Counterfeit Hardware\nML systems are only as reliable as the underlying hardware. In an era where hardware components are global commodities, the rise of counterfeit or cloned hardware presents a significant challenge. Counterfeit hardware encompasses any components that are unauthorized reproductions of original parts. Counterfeit components infiltrate ML systems through complex supply chains that stretch across borders and involve numerous stages from manufacture to delivery.\nA single lapse in the supply chain’s integrity can result in the insertion of counterfeit parts designed to closely imitate the functions and appearance of genuine hardware. For instance, a facial recognition system for high-security access control may be compromised if equipped with counterfeit processors. These processors could fail to accurately process and verify biometric data, potentially allowing unauthorized individuals to access restricted areas.\nThe challenge with counterfeit hardware is multifaceted. It undermines the quality and reliability of ML systems, as these components may degrade faster or perform unpredictably due to substandard manufacturing. The security risks are also profound; counterfeit hardware can contain vulnerabilities ripe for exploitation by malicious actors. For example, a cloned network router in an ML data center might include a hidden backdoor, enabling data interception or network intrusion without detection.\nFurthermore, counterfeit hardware poses legal and compliance risks. Companies inadvertently utilizing counterfeit parts in their ML systems may face serious legal repercussions, including fines and sanctions for failing to comply with industry regulations and standards. This is particularly true for sectors where compliance with specific safety and privacy regulations is mandatory, such as healthcare and finance.\nThe issue of counterfeit hardware is exacerbated by economic pressures to reduce costs, which can compel businesses to source from lower-cost suppliers without stringent verification processes. This economizing can inadvertently introduce counterfeit parts into otherwise secure systems. Additionally, detecting these counterfeits is inherently difficult since they are created to pass as the original components, often requiring sophisticated equipment and expertise to identify.\nIn ML, where decisions are made in real time and based on complex computations, the consequences of hardware failure are inconvenient and potentially dangerous. Stakeholders in the field of ML need to understand these risks thoroughly. The issues presented by counterfeit hardware necessitate a deep dive into the current challenges facing ML system integrity and emphasize the importance of vigilant, informed management of the hardware life cycle within these advanced systems.\n\n\n14.5.7 Supply Chain Risks\nThe threat of counterfeit hardware is closely tied to broader supply chain vulnerabilities. Globalized, interconnected supply chains create multiple opportunities for compromised components to infiltrate a product’s lifecycle. Supply chains involve numerous entities, from design to manufacturing, assembly, distribution, and integration. A lack of transparency and oversight of each partner makes verifying integrity at every step challenging. Lapses anywhere along the chain can allow the insertion of counterfeit parts.\nFor example, a contracted manufacturer may unknowingly receive and incorporate recycled electronic waste containing dangerous counterfeits. An untrustworthy distributor could smuggle in cloned components. Insider threats at any vendor might deliberately mix counterfeits into legitimate shipments.\nOnce counterfeits enter the supply stream, they move quickly through multiple hands before ending up in ML systems where detection is difficult. Advanced counterfeits like refurbished parts or clones with repackaged externals can masquerade as authentic components, passing visual inspection.\nTo identify fakes, thorough technical profiling using micrography, X-ray screening, component forensics, and functional testing is often required. However, such costly analysis is impractical for large-volume procurement.\nStrategies like supply chain audits, screening suppliers, validating component provenance, and adding tamper-evident protections can help mitigate risks. However, given global supply chain security challenges, a zero-trust approach is prudent. Designing ML systems to utilize redundant checking, fail-safes, and continuous runtime monitoring provides resilience against component compromises.\nRigorous validation of hardware sources coupled with fault-tolerant system architectures offers the most robust defense against the pervasive risks of convoluted, opaque global supply chains.\n\nCase Study\nIn 2018, Bloomberg Businessweek published an alarming story that got much attention in the tech world. The article claimed that Supermicro had secretly planted tiny spy chips on server hardware. Reporters said Chinese state hackers working with Supermicro could sneak these tiny chips onto motherboards during manufacturing. The tiny chips allegedly gave the hackers backdoor access to servers used by over 30 major companies, including Apple and Amazon.\nIf true, this would allow hackers to spy on private data or even tamper with systems. However, after investigating, Apple and Amazon found no proof that such hacked Supermicro hardware existed. Other experts questioned whether the Bloomberg article was accurate reporting.\nWhether the story is completely true or not is not our concern from a pedagogical viewpoint. However, this incident drew attention to the risks of global supply chains for hardware, especially manufactured in China. When companies outsource and buy hardware components from vendors worldwide, there needs to be more visibility into the process. In this complex global pipeline, there are concerns that counterfeits or tampered hardware could be slipped in somewhere along the way without tech companies realizing it. Companies relying too much on single manufacturers or distributors creates risk. For instance, due to the over-reliance on TSMC for semiconductor manufacturing, the U.S. has invested 50 billion dollars into the CHIPS Act.\nAs ML moves into more critical systems, verifying hardware integrity from design through production and delivery is crucial. The reported Supermicro backdoor demonstrated that for ML security, we cannot take global supply chains and manufacturing for granted. We must inspect and validate hardware at every link in the chain.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1551,7 +1551,7 @@ "section": "14.6 Embedded ML Hardware Security", "text": "14.6 Embedded ML Hardware Security\n\n14.6.1 Trusted Execution Environments\n\nAbout TEE\nA Trusted Execution Environment (TEE) is a secure area within a main processor that provides a high level of security for the execution of code and protection of data. TEEs operate by isolating the execution of sensitive tasks from the rest of the device’s operations, thereby creating an environment resistant to attacks from software and hardware vectors.\n\n\nBenefits\nTEEs are particularly valuable in scenarios where sensitive data must be processed or where the integrity of a system’s operations is critical. In the context of ML hardware, TEEs ensure that the ML algorithms and data are protected against tampering and leakage. This is essential because ML models often process private information, trade secrets, or data that could be exploited if exposed.\nFor instance, a TEE can protect ML model parameters from being extracted by malicious software on the same device. This protection is vital for privacy and maintaining the integrity of the ML system, ensuring that the models perform as expected and do not provide skewed outputs due to manipulated parameters. Apple’s Secure Enclave, found in iPhones and iPads, is a form of TEE that provides an isolated environment to protect sensitive user data and cryptographic operations.\nIn ML systems, TEEs can:\n\nSecurely perform model training and inference, ensuring the computation results remain confidential.\nProtect the confidentiality of input data, like biometric information, used for personal identification or sensitive classification tasks.\nSecure ML models by preventing reverse engineering, which can protect proprietary information and maintain a competitive advantage.\nEnable secure updates to ML models, ensuring that updates come from a trusted source and have not been tampered with in transit.\n\nThe importance of TEEs in ML hardware security stems from their ability to protect against external and internal threats, including the following:\n\nMalicious Software: TEEs can prevent high-privilege malware from accessing sensitive areas of the ML system.\nPhysical Tampering: By integrating with hardware security measures, TEEs can protect against physical tampering that attempts to bypass software security.\nSide-channel Attacks: Although not impenetrable, TEEs can mitigate certain side-channel attacks by controlling access to sensitive operations and data patterns.\n\n\n\nMechanics\nThe fundamentals of TEEs contain four main parts:\n\nIsolated Execution: Code within a TEE runs in a separate environment from the device’s main operating system. This isolation protects the code from unauthorized access by other applications.\nSecure Storage: TEEs can securely store cryptographic keys,authentication tokens, and sensitive data, preventing access by regular applications running outside the TEE.\nIntegrity Protection: TEEs can verify the integrity of code and data, ensuring that they have not been altered before execution or during storage.\nData Encryption: Data handled within a TEE can be encrypted, making it unreadable to entities without the proper keys, which are also managed within the TEE.\n\nHere are some examples of TEEs that provide hardware-based security for sensitive applications:\n\nARMTrustZone:This technology creates secure and normal world execution environments isolated using hardware controls and implemented in many mobile chipsets.\nIntelSGX:Intel’s Software Guard Extensions provide an enclave for code execution that protects against certain software attacks, specifically O.S. layer attacks. They are used to safeguard workloads in the cloud.\nQualcomm Secure Execution Environment:A Hardware sandbox on Qualcomm chipsets for mobile payment and authentication apps.\nApple SecureEnclave:TEE for biometric data and key management on iPhones and iPads.Facilitates mobile payments.\n\nFigure 14.6 is a diagram demonstrating a secure enclave isolated from the main processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. It also has a mechanism to store information securely on attached storage separate from the NAND flash storage used by the application processor and operating system. This design keeps sensitive user data secure even when the Application Processor kernel becomes compromised.\n\n\n\n\n\n\nFigure 14.6: System-on-chip secure enclave. Credit: Apple.\n\n\n\n\n\nTradeoffs\nIf TEEs are so good, why don’t all systems have TEE enabled by default? The decision to implement a TEE is not taken lightly. There are several reasons why a TEE might only be present in some systems by default. Here are some tradeoffs and challenges associated with TEEs:\nCost: Implementing TEEs involves additional costs. There are direct costs for the hardware and indirect costs associated with developing and maintaining secure software for TEEs. These costs may only be justifiable for some devices, especially low-margin products.\nComplexity: TEEs add complexity to system design and development. Integrating a TEE with existing systems requires a substantial redesign of the hardware and software stack, which can be a barrier, especially for legacy systems.\nPerformance Overhead: While TEEs offer enhanced security, they can introduce performance overhead. For example, the additional steps in verifying and encrypting data can slow down system performance, which may be critical in time-sensitive applications.\nDevelopment Challenges: Developing for TEEs requires specialized knowledge and often must adhere to strict development protocols. This can extend development time and complicate the debugging and testing processes.\nScalability and Flexibility: TEEs, due to their secure nature, may impose limitations on scalability and flexibility. Upgrading secure components or scaling the system for more users or data can be more challenging when everything must pass through a secure, enclosed environment.\nEnergy Consumption: The increased processing required for encryption, decryption, and integrity checks can lead to higher energy consumption, a significant concern for battery-powered devices.\nMarket Demand: Not all markets or applications require the level of security provided by TEEs. For many consumer applications, the perceived risk may be low enough that manufacturers opt not to include TEEs in their designs.\nSecurity Certification and Assurance: Systems with TEEs may need rigorous security certifications with bodies like Common Criteria (CC) or the European Union Agency for Cybersecurity (ENISA), which can be lengthy and expensive. Some organizations may choose to refrain from implementing TEEs to avoid these hurdles.\nLimited Resource Devices: Devices with limited processing power, memory, or storage may only support TEEs without compromising their primary functionality.\n\n\n\n14.6.2 Secure Boot\n\nAbout\nA secure boot is a security standard that ensures a device boots using only software trusted by the original equipment manufacturer (OEM). When the device starts up, the firmware checks the signature of each piece of boot software, including the bootloader, kernel, and base operating system, to ensure it’s not tampered with. If the signatures are valid, the device continues to boot. If not, the boot process stops to prevent potential security threats from executing.\n\n\nBenefits\nThe integrity of an ML system is critical from the moment it is powered on. A compromised boot process could undermine the system by allowing malicious software to load before the operating system and ML applications start. This could lead to manipulated ML operations, stolen data, or the device being repurposed for malicious activities such as botnets or crypto-mining.\nSecure Boot helps protect embedded ML hardware in several ways:\n\nProtecting ML Data: Ensuring that the data used by ML models, which may include private or sensitive information, is not exposed to tampering or theft during the boot process.\nGuarding Model Integrity: Maintaining the ML models’ integrity is important, as tampering with them could lead to incorrect or malicious outcomes.\nSecure Model Updates: Enabling secure updates to ML models and algorithms, ensuring that updates are authenticated and have not been altered.\n\n\n\nMechanics\nTEEs benefit from Secure Boot in multiple ways. Figure 14.7 illustrates a flow diagram of a trusted embedded system. For instance, during initial validation, Secure Boot ensures that the code running inside the TEE is the correct and untampered version approved by the device manufacturer. It can ensure resilience against tampering by verifying the digital signatures of the firmware and other critical components; Secure Boot prevents unauthorized modifications that could undermine the TEE’s security properties. Secure Boot establishes a foundation of trust upon which the TEE can securely operate, enabling secure operations such as cryptographic key management, secure processing, and sensitive data handling.\n\n\n\n\n\n\nFigure 14.7: Secure Boot flow. Credit: R. V. and A. (2018).\n\n\nR. V., Rashmi, and Karthikeyan A. 2018. “Secure Boot of Embedded Applications - a Review.” In 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 291–98. IEEE. https://doi.org/10.1109/iceca.2018.8474730.\n\n\n\n\nCase Study: Apple’s Face ID\nLet’s take a real-world example. Apple’s Face ID technology uses advanced machine learning algorithms to enable facial recognition on iPhones and iPads. It relies on a sophisticated framework of sensors and software to accurately map the geometry of a user’s face. For Face ID to function securely and protect user biometric data, the device’s operations must be trustworthy from the moment it is powered on, which is where Secure Boot plays a crucial role. Here’s how Secure Boot works in conjunction with Face ID:\nInitial Verification: When an iPhone is powered on, the Secure Boot process begins in the Secure Enclave, a coprocessor providing an extra security layer. The Secure Enclave is responsible for processing fingerprint data for Touch ID and facial recognition data for Face ID. The boot process verifies that Apple has signed the Secure Enclave’s firmware and has not been tampered with. This step ensures that the firmware used to process biometric data is authentic and safe.\nContinuous Security Checks: After the initial power-on self-test and verification by Secure Boot, the Secure Enclave communicates with the device’s main processor to continue the secure boot chain. It verifies the digital signatures of the iOS kernel and other critical boot components before allowing the boot process to proceed. This chained trust model prevents unauthorized modifications to the bootloader and operating system, which could compromise the device’s security.\nFace Data Processing: Once the device has completed its secure boot sequence, the Secure Enclave can interact safely with the ML algorithms that power Face ID. Facial recognition involves projecting and analyzing over 30,000 invisible dots to create a depth map of the user’s face and an infrared image. This data is then converted into a mathematical representation and compared with the registered face data securely stored in the Secure Enclave.\nSecure Enclave and Data Protection: The Secure Enclave is designed to protect sensitive data and handle the cryptographic operations that secure it. It ensures that even if the operating system kernel is compromised, the facial data cannot be accessed by unauthorized apps or attackers. Face ID data never leaves the device and is not backed up to iCloud or anywhere else.\nFirmware Updates: Apple frequently releases firmware updates to address security vulnerabilities and improve the functionality of its systems. Secure Boot ensures that each firmware update is authenticated and that only updates signed by Apple are installed on the device, preserving the integrity and security of the Face ID system.\nBy using Secure Boot with dedicated hardware like the Secure Enclave, Apple can provide strong security assurances for sensitive operations like facial recognition.\n\n\nChallenges\nImplementing Secure Boot poses several challenges that must be addressed to realize its full benefits.\nKey Management Complexity: Generating, storing, distributing, rotating, and revoking cryptographic keys provably securely is extremely challenging yet vital for maintaining the chain of trust. Any compromise of keys cripples protections. Large enterprises managing multitudes of device keys face particular scale challenges.\nPerformance Overhead: Checking cryptographic signatures during Boot can add 50-100ms or more per component verified. This delay may be prohibitive for time-sensitive or resource-constrained applications. However, performance impacts can be reduced through parallelization and hardware acceleration.\nSigning Burden: Developers must diligently ensure that all software components involved in the boot process - bootloaders, firmware, OS kernel, drivers, applications, etc. are correctly signed by trusted keys. Accommodating third-party code signing remains an issue.\nCryptographic Verification: Secure algorithms and protocols must validate the legitimacy of keys and signatures, avoid tampering or bypass, and support revocation. Accepting dubious keys undermines trust.\nCustomizability Constraints: Vendor-locked Secure Boot architectures limit user control and upgradability. Open-source bootloaders like u-boot and coreboot enable security while supporting customizability.\nScalable Standards: Emerging standards like Device Identifier Composition Engine (DICE) and IDevID promise to securely provision and manage device identities and keys at scale across ecosystems.\nAdopting Secure Boot requires following security best practices around key management, crypto validation, signed updates, and access control. Secure Boot provides a robust foundation for building device integrity and trust when implemented with care.\n\n\n\n14.6.3 Hardware Security Modules\n\nAbout HSM\nA Hardware Security Module (HSM) is a physical device that manages digital keys for strong authentication and provides crypto-processing. These modules are designed to be tamper-resistant and provide a secure environment for performing cryptographic operations. HSMs can come in standalone devices, plug-in cards, or integrated circuits on another device.\nHSMs are crucial for various security-sensitive applications because they offer a hardened, secure enclave for storing cryptographic keys and executing cryptographic functions. They are particularly important for ensuring the security of transactions, identity verifications, and data encryption.\n\n\nBenefits\nHSMs provide several functionalities that are beneficial for the security of ML systems:\nProtecting Sensitive Data: In machine learning applications, models often process sensitive data that can be proprietary or personal. HSMs protect the encryption keys used to secure this data, both at rest and in transit, from exposure or theft.\nEnsuring Model Integrity: The integrity of ML models is vital for their reliable operation. HSMs can securely manage the signing and verification processes for ML software and firmware, ensuring unauthorized parties have not altered the models.\nSecure Model Training and Updates: The training and updating of ML models involve the processing of potentially sensitive data. HSMs ensure that these processes are conducted within a secure cryptographic boundary, protecting against the exposure of training data and unauthorized model updates.\n\n\nTradeoffs\nHSMs involve several tradeoffs for embedded ML. These tradeoffs are similar to TEEs, but for completeness, we will also discuss them here through the lens of HSM.\nCost: HSMs are specialized devices that can be expensive to procure and implement, raising the overall cost of an ML project. This may be a significant factor for embedded systems, where cost constraints are often stricter.\nPerformance Overhead: While secure, the cryptographic operations performed by HSMs can introduce latency. Any added delay can be critical in high-performance embedded ML applications where inference must happen in real-time, such as in autonomous vehicles or translation devices.\nPhysical Space: Embedded systems are often limited by physical space, and adding an HSM can be challenging in tightly constrained environments. This is especially true for consumer electronics and wearable technology, where size and form factor are key considerations.\nPower Consumption: HSMs require power for their operation, which can be a drawback for battery-operated devices with long battery life. The secure processing and cryptographic operations can drain the battery faster, a significant tradeoff for mobile or remote embedded ML applications.\nComplexity in Integration: Integrating HSMs into existing hardware systems adds complexity. It often requires specialized knowledge to manage the secure communication between the HSM and the system’s processor and develop software capable of interfacing with the HSM.\nScalability: Scaling an ML solution that uses HSMs can be challenging. Managing a fleet of HSMs and ensuring uniformity in security practices across devices can become complex and costly when the deployment size increases, especially when dealing with embedded systems where communication is costly.\nOperational Complexity: HSMs can make updating firmware and ML models more complex. Every update must be signed and possibly encrypted, which adds steps to the update process and may require secure mechanisms for key management and update distribution.\nDevelopment and Maintenance: The secure nature of HSMs means that only limited personnel have access to the HSM for development and maintenance purposes. This can slow down the development process and make routine maintenance more difficult.\nCertification and Compliance: Ensuring that an HSM meets specific industry standards and compliance requirements can add to the time and cost of development. This may involve undergoing rigorous certification processes and audits.\n\n\n\n14.6.4 Physical Unclonable Functions (PUFs)\n\nAbout\nPhysical Unclonable Functions (PUFs) provide a hardware-intrinsic means for cryptographic key generation and device authentication by harnessing the inherent manufacturing variability in semiconductor components. During fabrication, random physical factors such as doping variations, line edge roughness, and dielectric thickness result in microscale differences between semiconductors, even when produced from the same masks. These create detectable timing and power variances that act as a \"fingerprint” unique to each chip. PUFs exploit this phenomenon by incorporating integrated circuits to amplify minute timing or power differences into measurable digital outputs.\nWhen stimulated with an input challenge, the PUF circuit produces an output response based on the device’s intrinsic physical characteristics. Due to their physical uniqueness, the same challenge will yield a different response on other devices. This challenge-response mechanism can be used to generate keys securely and identifiers tied to the specific hardware, perform device authentication, or securely store secrets. For example, a key derived from a PUF will only work on that device and cannot be cloned or extracted even with physical access or full reverse engineering (Gao, Al-Sarawi, and Abbott 2020).\n\n\nBenefits\nPUF key generation avoids external key storage, which risks exposure. It also provides a foundation for other hardware security primitives like Secure Boot. Implementation challenges include managing varying reliability and entropy across different PUFs, sensitivity to environmental conditions, and susceptibility to machine learning modeling attacks. When designed carefully, PUFs enable promising applications in IP protection, trusted computing, and anti-counterfeiting.\n\n\nUtility\nMachine learning models are rapidly becoming a core part of the functionality for many embedded devices, such as smartphones, smart home assistants, and autonomous drones. However, securing ML on resource-constrained embedded hardware can be challenging. This is where physical unclonable functions (PUFs) come in uniquely handy. Let’s look at some examples of how PUFs can be useful.\nPUFs provide a way to generate unique fingerprints and cryptographic keys tied to the physical characteristics of each chip on the device. Let’s take an example. We have a smart camera drone that uses embedded ML to track objects. A PUF integrated into the drone’s processor could create a device-specific key to encrypt the ML model before loading it onto the drone. This way, even if an attacker somehow hacks the drone and tries to steal the model, they won’t be able to use it on another device!\nThe same PUF key could also create a digital watermark embedded in the ML model. If that model ever gets leaked and posted online by someone trying to pirate it, the watermark could help prove it came from your stolen drone and didn’t originate from the attacker. Also, imagine the drone camera connects to the cloud to offload some of its ML processing. The PUF can authenticate that the camera is legitimate before the cloud will run inference on sensitive video feeds. The cloud could verify that the drone has not been physically tampered with by checking that the PUF responses have not changed.\nPUFs enable all this security through their challenge-response behavior’s inherent randomness and hardware binding. Without needing to store keys externally, PUFs are ideal for securing embedded ML with limited resources. Thus, they offer a unique advantage over other mechanisms.\n\n\nMechanics\nThe working principle behind PUFs, shown in Figure 14.8, involves generating a \"challenge-response” pair, where a specific input (the challenge) to the PUF circuit results in an output (the response) that is determined by the unique physical properties of that circuit. This process can be likened to a fingerprinting mechanism for electronic devices. Devices that utilize ML for processing sensor data can employ PUFs to secure communication between devices and prevent the execution of ML models on counterfeit hardware.\nFigure 14.8 illustrates an overview of the PUF basics: a) PUF can be thought of as a unique fingerprint for each piece of hardware; b) an Optical PUF is a special plastic token that is illuminated, creating a unique speckle pattern that is then recorded; c) in an APUF (Arbiter PUF), challenge bits select different paths, and a judge decides which one is faster, giving a response of ‘1’ or ‘0’; d) in an SRAM PUF, the response is determined by the mismatch in the threshold voltage of transistors, where certain conditions lead to a preferred response of ‘1’. Each of these methods uses specific characteristics of the hardware to create a unique identifier.\n\n\n\n\n\n\nFigure 14.8: PUF basics. Credit: Gao, Al-Sarawi, and Abbott (2020).\n\n\nGao, Yansong, Said F. Al-Sarawi, and Derek Abbott. 2020. “Physical Unclonable Functions.” Nature Electronics 3 (2): 81–91. https://doi.org/10.1038/s41928-020-0372-5.\n\n\n\n\nChallenges\nThere are a few challenges with PUFs. The PUF response can be sensitive to environmental conditions, such as temperature and voltage fluctuations, leading to inconsistent behavior that must be accounted for in the design. Also, since PUFs can generate many unique challenge-response pairs, managing and ensuring the consistency of these pairs across the device’s lifetime can be challenging. Last but not least, integrating PUF technology may increase the overall manufacturing cost of a device, although it can save costs in key management over the device’s lifecycle.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1562,7 +1562,7 @@ "section": "14.7 Privacy Concerns in Data Handling", "text": "14.7 Privacy Concerns in Data Handling\nHandling personal and sensitive data securely and ethically is critical as machine learning permeates devices like smartphones, wearables, and smart home appliances. For medical hardware, handling data securely and ethically is further required by law through the Health Insurance Portability and Accountability Act (HIPAA). These embedded ML systems pose unique privacy risks, given their intimate proximity to users’ lives.\n\n14.7.1 Sensitive Data Types\nEmbedded ML devices like wearables, smart home assistants, and autonomous vehicles frequently process highly personal data that requires careful handling to maintain user privacy and prevent misuse. Specific examples include medical reports and treatment plans processed by health wearables, private conversations continuously captured by smart home assistants, and detailed driving habits collected by connected cars. Compromise of such sensitive data can lead to serious consequences like identity theft, emotional manipulation, public shaming, and mass surveillance overreach.\nSensitive data takes many forms - structured records like contact lists and unstructured content like conversational audio and video streams. In medical settings, protected health information (PHI) is collected by doctors throughout every interaction and is heavily regulated by strict HIPAA guidelines. Even outside of medical settings, sensitive data can still be collected in the form of Personally Identifiable Information (PII), which is defined as “any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.” Examples of PII include email addresses, social security numbers, and phone numbers, among other fields. PII is collected in medical settings and other settings (financial applications, etc) and is heavily regulated by Department of Labor policies.\nEven derived model outputs could indirectly leak details about individuals. Beyond just personal data, proprietary algorithms and datasets also warrant confidentiality protections. In the Data Engineering section, we covered several topics in detail.\nTechniques like de-identification, aggregation, anonymization, and federation can help transform sensitive data into less risky forms while retaining analytical utility. However, diligent controls around access, encryption, auditing, consent, minimization, and compliance practices are still essential throughout the data lifecycle. Regulations like GDPR categorize different classes of sensitive data and prescribe responsibilities around their ethical handling. Standards like NIST 800-53 provide rigorous security control guidance for confidentiality protection. With growing reliance on embedded ML, understanding sensitive data risks is crucial.\n\n\n14.7.2 Applicable Regulations\nMany embedded ML applications handle sensitive user data under HIPAA, GDPR, and CCPA regulations. Understanding the protections mandated by these laws is crucial for building compliant systems.\n\nHIPAA Privacy Rule establishes care providers that conduct certain governs medical data privacy and security in the US, with severe penalties for violations. Any health-related embedded ML devices like diagnostic wearables or assistive robots would need to implement controls like audit trails, access controls, and encryption prescribed by HIPAA.\nGDPR imposes transparency, retention limits, and user rights on EU citizen data, even when processed by companies outside the EU. Smart home systems capturing family conversations or location patterns would need GDPR compliance. Key requirements include data minimization, encryption, and mechanisms for consent and erasure.\nCCPA, which applies in California, protects consumer data privacy through provisions like required disclosures and opt-out rights—ioT gadgets like smart speakers and fitness trackers Californians use likely to fall under its scope.\nThe CCPA was the first state-specific set of regulations regarding privacy concerns. Following the CCPA, similar regulations were also enacted in 10 other states, with some states proposing bills for consumer data privacy protections.\n\nAdditionally, when relevant to the application, sector-specific rules govern telematics, financial services, utilities, etc. Best practices like Privacy by design, impact assessments, and maintaining audit trails help embed compliance if it is not already required by law. Given potentially costly penalties, consulting legal/compliance teams is advisable when developing regulated embedded ML systems.\n\n\n14.7.3 De-identification\nIf medical data is de-identified thoroughly, HIPAA guidelines do not directly apply, and there are far fewer regulations. However, medical data needs to be de-identified using HIPAA methods (Safe Harbor methods or Expert Determination methods) for HIPAA guidelines to no longer apply.\n\nSafe Harbor Methods\nSafe Harbor methods are most commonly used for de-identifying protected healthcare information due to the limited resources needed compared to Expert Determination methods. Safe Harbor de-identification requires scrubbing datasets of any data that falls into one of 18 categories. The following categories are listed as sensitive information based on the Safe Harbor standard:\n\nName, Geographic locator, Birthdate, Phone Number, Email Address, addresses, Social Security Numbers, Medical Record Numbers, health beneficiary Numbers, Device Identifiers and Serial Numbers, Certificate/License Numbers (Birth Certificate, Drivers License, etc), Account Numbers, Vehicle Identifiers, Website URLs, FullFace Photos and Comparable Images, Biometric Identifiers, Any other unique identifiers\n\nFor most of these categories, all data must be removed regardless of the circumstances. For other categories, including geographical information and birthdate, the data can be partially removed enough to make the information hard to re-identify. For example, if a zip code is large enough, the first 3 digits can remain since there are enough people in the geographic area to make re-identification difficult. Birthdates need to be scrubbed of all elements except birth year, and all ages above 89 need to be aggregated into a 90+ category.\n\n\nExpert Determination Methods\nSafe Harbor methods work for several cases of medical data de-identification, though re-identification is still possible in some cases. For example, let’s say you collect data on a patient in an urban city with a large zip code, but you have documented a rare disease that they have—a disease that only 25 people have in the entire city. Given geographic data coupled with birth year, it is highly possible that someone can re-identify this individual, which is an extremely detrimental privacy breach.\nIn unique cases like these, expert determination data de-identification methods are preferred. Expert determination de-identification requires a “person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable” to evaluate a dataset and determine if the risk of re-identification of individual data in a given dataset in combination with publicly available data (voting records, etc.), is extremely small.\nExpert Determination de-identification is understandably harder to complete than Safe Harbour de-identification due to the cost and feasibility of accessing an expert to verify the likelihood of re-identifying a dataset. However, in many cases, expert determination is required to ensure that re-identification of data is extremely unlikely.\n\n\n\n14.7.4 Data Minimization\nData minimization involves collecting, retaining, and processing only the necessary user data to reduce privacy risks from embedded ML systems. This starts by restricting the data types and instances gathered to the bare minimum required for the system’s core functionality. For example, an object detection model only collects the images needed for that specific computer vision task. Similarly, a voice assistant would limit audio capture to specific spoken commands rather than persistently recording ambient sounds.\nWhere possible, temporary data that briefly resides in memory without persisting storage provides additional minimization. A clear legal basis, like user consent, should be established for collection and retention. Sandboxing and access controls prevent unauthorized use beyond intended tasks. Retention periods should be defined based on purpose, with secure deletion procedures removing expired data.\nData minimization can be broken down into 3 categories:\n\n“Data must be adequate about the purpose that is pursued.” Data omission can limit the accuracy of models trained on the data and any general usefulness of a dataset. Data minimization requires a minimum amount of data to be collected from users while creating a dataset that adds value to others.\nThe data collected from users must be relevant to the purpose of the data collection.\nUsers’ data should be limited to only the necessary data to fulfill the purpose of the initial data collection. If similarly robust and accurate results can be obtained from a smaller dataset, any additional data beyond this smaller dataset should not be collected.\n\nEmerging techniques like differential Privacy, federated learning, and synthetic data generation allow useful insights derived from less raw user data. Performing data flow mapping and impact assessments helps identify opportunities to minimize raw data usage.\nMethodologies like Privacy by Design (Cavoukian 2009) consider such minimization early in system architecture. Regulations like GDPR also mandate data minimization principles. With a multilayered approach across legal, technical, and process realms, data minimization limits risks in embedded ML products.\n\nCavoukian, Ann. 2009. “Privacy by Design.” Office of the Information and Privacy Commissioner.\n\nCase Study - Performance-Based Data Minimization\nPerformance-based data minimization (Biega et al. 2020) focuses on expanding upon the third category of data minimization mentioned above, namely limitation. It specifically defines the robustness of model results on a given dataset by certain performance metrics, such that data should not be additionally collected if it does not significantly improve performance. Performance metrics can be divided into two categories:\n\nBiega, Asia J., Peter Potash, Hal Daumé, Fernando Diaz, and Michèle Finck. 2020. “Operationalizing the Legal Principle of Data Minimization for Personalization.” In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, edited by Jimmy Huang, Yi Chang, Xueqi Cheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun Liu, 399–408. ACM. https://doi.org/10.1145/3397271.3401034.\n\nGlobal data minimization performance\n\n\nSatisfied if a dataset minimizes the amount of per-user data while its mean performance across all data is comparable to the mean performance of the original, unminimized dataset.\n\n\nPer user data minimization performance\n\n\nSatisfied if a dataset minimizes the amount of per-user data while the minimum performance of individual user data is comparable to that of individual user data in the original, unminimized dataset.\n\nPerformance-based data minimization can be leveraged in machine-learning settings, including movie recommendation algorithms and e-commerce settings.\nGlobal data minimization is much more feasible than per-user data minimization, given the much more significant difference in per-user losses between the minimized and original datasets.\n\n\n\n14.7.5 Consent and Transparency\nMeaningful consent and transparency are crucial when collecting user data for embedded ML products like smart speakers, wearables, and autonomous vehicles. When first set up. Ideally, the device should clearly explain what data types are gathered, for what purposes, how they are processed, and retention policies. For example, a smart speaker might collect voice samples to train speech recognition and personalized voice profiles. During use, reminders and dashboard options provide ongoing transparency into how data is handled, such as weekly digests of captured voice snippets. Control options allow revoking or limiting consent, like turning off the storage of voice profiles.\nConsent flows should provide granular controls beyond just binary yes/no choices. For instance, users could selectively consent to certain data uses, such as training speech recognition, but not personalization. Focus groups and usability testing with target users shape consent interfaces and wording of privacy policies to optimize comprehension and control. Respecting user rights, such as data deletion and rectification, demonstrates trustworthiness. Vague legal jargon hampers transparency. Regulations like GDPR and CCPA reinforce consent requirements. Thoughtful consent and transparency provide users agency over their data while building trust in embedded ML products through open communication and control.\n\n\n14.7.6 Privacy Concerns in Machine Learning\n\nGenerative AI\nPrivacy and security concerns have also risen with the public use of generative AI models, including OpenAI’s GPT4 and other LLMs. ChatGPT, in particular, has been discussed more recently about Privacy, given all the personal information collected from ChatGPT users. In June, a class action lawsuit was filed against ChatGPT due to concerns that it was trained on proprietary medical and personal information without proper permissions or consent. As a result of these privacy concerns, many companies have prohibited their employees from accessing ChatGPT, and uploading private, company related information to the chatbot. Further, ChatGPT is susceptible to prompt injection and other security attacks that could compromise the privacy of the proprietary data upon which it was trained.\n\nCase Study\nWhile ChatGPT has instituted protections to prevent people from accessing private and ethically questionable information, several individuals have successfully bypassed these protections through prompt injection and other security attacks. As demonstrated in Figure 14.9, users can bypass ChatGPT protections to mimic the tone of a “deceased grandmother” to learn how to bypass a web application firewall (Gupta et al. 2023).\n\n\n\n\n\n\nFigure 14.9: Grandma role play to bypass safety restrictions. Credit: Gupta et al. (2023).\n\n\n\nFurther, users have also successfully used reverse psychology to manipulate ChatGPT and access information initially prohibited by the model. In Figure 14.10, a user is initially prevented from learning about piracy websites through ChatGPT but can bypass these restrictions using reverse psychology.\n\n\n\n\n\n\nFigure 14.10: Reverse psychology to bypass safety restrictions. Credit: Gupta et al. (2023).\n\n\nGupta, Maanak, Charankumar Akiri, Kshitiz Aryal, Eli Parker, and Lopamudra Praharaj. 2023. “From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy.” #IEEE_O_ACC# 11: 80218–45. https://doi.org/10.1109/access.2023.3300381.\n\n\nThe ease at which security attacks can manipulate ChatGPT is concerning, given the private information it was trained upon without consent. Further research on data privacy in LLMs and generative AI should focus on preventing the model from being so naive to prompt injection attacks.\n\n\n\nData Erasure\nMany previous regulations mentioned above, including GDPR, include a “right to be forgotten” clause. This clause essentially states that “the data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay.” However, in several cases, even if user data has been erased from a platform, the data is only partially erased if a machine learning model has been trained on this data for separate purposes. Through methods similar to membership inference attacks, other individuals can still predict the training data a model was trained upon, even if the data’s presence was explicitly removed online.\nOne approach to addressing privacy concerns with machine learning training data has been through differential privacy methods. For example, by adding Laplacian noise in the training set, a model can be robust to membership inference attacks, preventing deleted data from being recovered. Another approach to preventing deleted data from being inferred from security attacks is simply retraining the model from scratch on the remaining data. Since this process is time-consuming and computationally expensive, other researchers have attempted to address privacy concerns surrounding inferring model training data through a process called machine unlearning, in which a model actively iterates on itself to remove the influence of “forgotten” data that it might have been trained on, as mentioned below.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1573,7 +1573,7 @@ "section": "14.8 Privacy-Preserving ML Techniques", "text": "14.8 Privacy-Preserving ML Techniques\nMany techniques have been developed to preserve privacy, each addressing different aspects and data security challenges. These methods can be broadly categorized into several key areas: Differential Privacy, which focuses on statistical privacy in data outputs; Federated Learning, emphasizing decentralized data processing; Homomorphic Encryption and Secure Multi-party Computation (SMC), both enabling secure computations on encrypted or private data; Data Anonymization and Data Masking and Obfuscation, which alter data to protect individual identities; Private Set Intersection and Zero-Knowledge Proofs, facilitating secure data comparisons and validations; Decentralized Identifiers (DIDs) for self-sovereign digital identities; Privacy-Preserving Record Linkage (PPRL), linking data across sources without exposure; Synthetic Data Generation, creating artificial datasets for safe analysis; and Adversarial Learning Techniques, enhancing data or model resistance to privacy attacks.\nGiven the extensive range of these techniques, it is not feasible to delve into each in depth within a single course or discussion, let alone for anyone to know it all in its glorious detail. Therefore, we will explore a few specific techniques in relative detail, providing a deeper understanding of their principles, applications, and the unique privacy challenges they address in machine learning. This focused approach will give us a more comprehensive and practical understanding of key privacy-preserving methods in modern ML systems.\n\n14.8.1 Differential Privacy\n\nCore Idea\nDifferential Privacy is a framework for quantifying and managing the privacy of individuals in a dataset (Dwork et al. 2006). It provides a mathematical guarantee that the privacy of individuals in the dataset will not be compromised, regardless of any additional knowledge an attacker may possess. The core idea of differential Privacy is that the outcome of any analysis (like a statistical query) should be essentially the same, whether any individual’s data is included in the dataset or not. This means that by observing the analysis result, one cannot determine whether any individual’s data was used in the computation.\n\nDwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. “Calibrating Noise to Sensitivity in Private Data Analysis.” In Theory of Cryptography, edited by Shai Halevi and Tal Rabin, 265–84. Berlin, Heidelberg: Springer Berlin Heidelberg.\nFor example, let’s say a database contains medical records for 10 patients. We want to release statistics about the prevalence of diabetes in this sample without revealing one patient’s condition. To do this, we could add a small amount of random noise to the true count before releasing it. If the true number of diabetes patients is 6, we might add noise from a Laplace distribution to randomly output 5, 6, or 7 each with some probability. An observer now can’t tell if any single patient has diabetes based only on the noisy output. The query result looks similar to whether each patient’s data is included or excluded. This is differential Privacy. More formally, a randomized algorithm satisfies ε-differential Privacy if, for any neighbor databases D and Dʹ differing by only one entry, the probability of any outcome changes by at most a factor of ε. A lower ε provides stronger privacy guarantees.\nThe Laplace Mechanism is one of the most straightforward and commonly used methods to achieve differential Privacy. It involves adding noise that follows a Laplace distribution to the data or query results. Apart from the Laplace Mechanism, the general principle of adding noise is central to differential Privacy. The idea is to add random noise to the data or the results of a query. The noise is calibrated to ensure the necessary privacy guarantee while keeping the data useful.\nWhile the Laplace distribution is common, other distributions like Gaussian can also be used. Laplace noise is used for strict ε-differential Privacy for low-sensitivity queries. In contrast, Gaussian distributions can be used when Privacy is not guaranteed, known as (ϵ, 𝛿)-Differential Privacy. In this relaxed version of differential Privacy, epsilon and delta define the amount of Privacy guaranteed when releasing information or a model related to a dataset. Epsilon sets a bound on how much information can be learned about the data based on the output. At the same time, delta allows for a small probability of the privacy guarantee to be violated. The choice between Laplace, Gaussian, and other distributions will depend on the specific requirements of the query and the dataset and the tradeoff between Privacy and accuracy.\nTo illustrate the tradeoff of Privacy and accuracy in (\\(\\epsilon\\), \\(\\delta\\))-differential Privacy, the following graphs in Figure 14.11 show the results on accuracy for different noise levels on the MNIST dataset, a large dataset of handwritten digits (Abadi et al. 2016). The delta value (black line; right y-axis) denotes the level of privacy relaxation (a high value means Privacy is less stringent). As Privacy becomes more relaxed, the accuracy of the model increases.\n\n\n\n\n\n\nFigure 14.11: Privacy-accuracy tradeoff. Credit: Abadi et al. (2016).\n\n\nAbadi, Martin, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. “Deep Learning with Differential Privacy.” In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–18. CCS ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2976749.2978318.\n\n\nThe key points to remember about differential Privacy are the following:\n\nAdding Noise: The fundamental technique in differential Privacy is adding controlled random noise to the data or query results. This noise masks the contribution of individual data points.\nBalancing Act: There’s a balance between Privacy and accuracy. More noise (lower ϵ) in the data means higher Privacy but less accuracy in the model’s results.\nUniversality: Differential Privacy doesn’t rely on assumptions about what an attacker knows. This makes it robust against re-identification attacks, where an attacker tries to uncover individual data.\nApplicability: It can be applied to various types of data and queries, making it a versatile tool for privacy-preserving data analysis.\n\n\n\nTradeoffs\nThere are several tradeoffs to make with differential Privacy, as is the case with any algorithm. But let’s focus on the computational-specific tradeoffs since we care about ML systems. There are some key computational considerations and tradeoffs when implementing differential Privacy in a machine-learning system:\nNoise generation: Implementing differential Privacy introduces several important computational tradeoffs compared to standard machine learning techniques. One major consideration is the need to securely generate random noise from distributions like Laplace or Gaussian that get added to query results and model outputs. High-quality cryptographic random number generation can be computationally expensive.\nSensitivity analysis: Another key requirement is rigorously tracking the sensitivity of the underlying algorithms to single data points getting added or removed. This global sensitivity analysis is required to calibrate the noise levels properly. However, analyzing worst-case sensitivity can substantially increase computational complexity for complex model training procedures and data pipelines.\nPrivacy budget management: Managing the privacy loss budget across multiple queries and learning iterations is another bookkeeping overhead. The system must keep track of cumulative privacy costs and compose them to explain overall privacy guarantees. This adds a computational burden beyond just running queries or training models.\nBatch vs. online tradeoffs: For online learning systems with continuous high-volume queries, differentially private algorithms require new mechanisms to maintain utility and prevent too much accumulated privacy loss since each query can potentially alter the privacy budget. Batch offline processing is simpler from a computational perspective as it processes data in large batches, where each batch is treated as a single query. High-dimensional sparse data also increases sensitivity analysis challenges.\nDistributed training: When training models using distributed or federated approaches, new cryptographic protocols are needed to track and bound privacy leakage across nodes. Secure multiparty computation with encrypted data for differential Privacy adds substantial computational load.\nWhile differential Privacy provides strong formal privacy guarantees, implementing it rigorously requires additions and modifications to the machine learning pipeline at a computational cost. Managing these overheads while preserving model accuracy remains an active research area.\n\n\nCase Study\nApple’s implementation of differential Privacy in iOS and MacOS provides a prominent real-world example of how differential Privacy can be deployed at large scale. Apple wanted to collect aggregated usage statistics across their ecosystem to improve products and services, but aimed to do so without compromising individual user privacy.\nTo achieve this, they implemented differential privacy techniques directly on user devices to anonymize data points before sending them to Apple servers. Specifically, Apple uses the Laplace mechanism to inject carefully calibrated random noise. For example, suppose a user’s location history contains [Work, Home, Work, Gym, Work, Home]. In that case, the differentially private version might replace the exact locations with a noisy sample like [Gym, Home, Work, Work, Home, Work].\nApple tunes the Laplace noise distribution to provide a high level of Privacy while preserving the utility of aggregated statistics. Increasing noise levels provides stronger privacy guarantees (lower ε values in DP terminology) but can reduce data utility. Apple’s privacy engineers empirically optimized this tradeoff based on their product goals.\nApple obtains high-fidelity aggregated statistics by aggregating hundreds of millions of noisy data points from devices. For instance, they can analyze new iOS apps’ features while masking any user’s app behaviors. On-device computation avoids sending raw data to Apple servers.\nThe system uses hardware-based secure random number generation to sample from the Laplace distribution on devices efficiently. Apple also had to optimize its differentially private algorithms and pipeline to operate under the computational constraints of consumer hardware.\nMultiple third-party audits have verified that Apple’s system provides rigorous differential privacy protections in line with their stated policies. Of course, assumptions around composition over time and potential re-identification risks still apply. Apple’s deployment shows how differential Privacy can be realized in large real-world products when backed by sufficient engineering resources.\n\n\n\n\n\n\nExercise 14.1: Differential Privacy - TensorFlow Privacy\n\n\n\n\n\nWant to train an ML model without compromising anyone’s secrets? Differential Privacy is like a superpower for your data! In this Colab, we’ll use TensorFlow Privacy to add special noise during training. This makes it way harder for anyone to determine if a single person’s data was used, even if they have sneaky ways of peeking at the model.\n\n\n\n\n\n\n\n14.8.2 Federated Learning\n\nCore Idea\nFederated Learning (FL) is a type of machine learning in which a model is built and distributed across multiple devices or servers while keeping the training data localized. It was previously discussed in the Model Optimizations chapter. Still, we will recap it here briefly to complete it and focus on things that pertain to this chapter.\nFL aims to train machine learning models across decentralized networks of devices or systems while keeping all training data localized. Figure 14.12 illustrates this process: each participating device leverages its local data to calculate model updates, which are then aggregated to build an improved global model. However, the raw training data is never directly shared, transferred, or compiled. This privacy-preserving approach allows for the joint development of ML models without centralizing the potentially sensitive training data in one place.\n\n\n\n\n\n\nFigure 14.12: Federated Learning lifecycle. Credit: Jin et al. (2020).\n\n\nJin, Yilun, Xiguang Wei, Yang Liu, and Qiang Yang. 2020. “Towards Utilizing Unlabeled Data in Federated Learning: A Survey and Prospective.” arXiv Preprint arXiv:2002.11545.\n\n\nOne of the most common model aggregation algorithms is Federated Averaging (FedAvg), where the global model is created by averaging all of the parameters from local parameters. While FedAvg works well with independent and identically distributed data (IID), alternate algorithms like Federated Proximal (FedProx) are crucial in real-world applications where data is often non-IID. FedProx is designed for the FL process when there is significant heterogeneity in the client updates due to diverse data distributions across devices, computational capabilities, or varied amounts of data.\nBy leaving the raw data distributed and exchanging only temporary model updates, federated learning provides a more secure and privacy-enhancing alternative to traditional centralized machine learning pipelines. This allows organizations and users to benefit collaboratively from shared models while maintaining control and ownership over sensitive data. The decentralized nature of FL also makes it robust to single points of failure.\nImagine a group of hospitals that want to collaborate on a study to predict patient outcomes based on their symptoms. However, they cannot share their patient data due to privacy concerns and regulations like HIPAA. Here’s how Federated Learning can help.\n\nLocal Training: Each hospital trains a machine learning model on patient data. This training happens locally, meaning the data never leaves the hospital’s servers.\nModel Sharing: After training, each hospital only sends the model (specifically, its parameters or weights ) to a central server. It does not send any patient data.\nAggregating Models: The central server aggregates these models from all hospitals into a single, more robust model. This process typically involves averaging the model parameters.\nBenefit: The result is a machine learning model that has learned from a wide range of patient data without sharing sensitive data or removing it from its original location.\n\n\n\nTradeoffs\nThere are several system performance-related aspects of FL in machine learning systems. It would be wise to understand these tradeoffs because there is no “free lunch” for preserving Privacy through FL (Li et al. 2020).\n\nLi, Tian, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. “Federated Learning: Challenges, Methods, and Future Directions.” IEEE Signal Process Mag. 37 (3): 50–60. https://doi.org/10.1109/msp.2020.2975749.\nCommunication Overhead and Network Constraints: In FL, one of the most significant challenges is managing the communication overhead. This involves the frequent transmission of model updates between a central server and numerous client devices, which can be bandwidth-intensive. The total number of communication rounds and the size of transmitted messages per round need to be reduced to minimize communication further. This can lead to substantial network traffic, especially in scenarios with many participants. Additionally, latency becomes a critical factor — the time taken for these updates to be sent, aggregated, and redistributed can introduce delays. This affects the overall training time and impacts the system’s responsiveness and real-time capabilities. Managing this communication while minimizing bandwidth usage and latency is crucial for implementing FL.\nComputational Load on Local Devices: FL relies on client devices (like smartphones or IoT devices, which especially matter in TinyML) for model training, which often have limited computational power and battery life. Running complex machine learning algorithms locally can strain these resources, leading to potential performance issues. Moreover, the capabilities of these devices can vary significantly, resulting in uneven contributions to the model training process. Some devices process updates faster and more efficiently than others, leading to disparities in the learning process. Balancing the computational load to ensure consistent participation and efficiency across all devices is a key challenge in FL.\nModel Training Efficiency: FL’s decentralized nature can impact model training’s efficiency. Achieving convergence, where the model no longer significantly improves, can be slower in FL than in centralized training methods. This is particularly true in cases where the data is non-IID (non-independent and identically distributed) across devices. Additionally, the algorithms used for aggregating model updates play a critical role in the training process. Their efficiency directly affects the speed and effectiveness of learning. Developing and implementing algorithms that can handle the complexities of FL while ensuring timely convergence is essential for the system’s performance.\nScalability Challenges: Scalability is a significant concern in FL, especially as the number of participating devices increases. Managing and coordinating model updates from many devices adds complexity and can strain the system. Ensuring that the system architecture can efficiently handle this increased load without degrading performance is crucial. This involves not just handling the computational and communication aspects but also maintaining the quality and consistency of the model as the scale of the operation grows. A key challenge is designing FL systems that scale effectively while maintaining performance.\nData Synchronization and Consistency: Ensuring data synchronization and maintaining model consistency across all participating devices in FL is challenging. Keeping all devices synchronized with the latest model version can be difficult in environments with intermittent connectivity or devices that go offline periodically. Furthermore, maintaining consistency in the learned model, especially when dealing with a wide range of devices with different data distributions and update frequencies, is crucial. This requires sophisticated synchronization and aggregation strategies to ensure that the final model accurately reflects the learnings from all devices.\nEnergy Consumption: The energy consumption of client devices in FL is a critical factor, particularly for battery-powered devices like smartphones and other TinyML/IoT devices. The computational demands of training models locally can lead to significant battery drain, which might discourage continuous participation in the FL process. Balancing the computational requirements of model training with energy efficiency is essential. This involves optimizing algorithms and training processes to reduce energy consumption while achieving effective learning outcomes. Ensuring energy-efficient operation is key to user acceptance and the sustainability of FL systems.\n\n\nCase Studies\nHere are a couple of real-world case studies that can illustrate the use of federated learning:\n\nGoogle Gboard\nGoogle uses federated learning to improve predictions on its Gboard mobile keyboard app. The app runs a federated learning algorithm on users’ devices to learn from their local usage patterns and text predictions while keeping user data private. The model updates are aggregated in the cloud to produce an enhanced global model. This allows for providing next-word predictions personalized to each user’s typing style while avoiding directly collecting sensitive typing data. Google reported that the federated learning approach reduced prediction errors by 25% compared to the baseline while preserving Privacy.\n\n\nHealthcare Research\nThe UK Biobank and American College of Cardiology combined datasets to train a model for heart arrhythmia detection using federated learning. The datasets could not be combined directly due to legal and Privacy restrictions. Federated learning allowed collaborative model development without sharing protected health data, with only model updates exchanged between the parties. This improved model accuracy as it could leverage a wider diversity of training data while meeting regulatory requirements.\n\n\nFinancial Services\nBanks are exploring using federated learning for anti-money laundering (AML) detection models. Multiple banks could jointly improve AML Models without sharing confidential customer transaction data with competitors or third parties. Only the model updates need to be aggregated rather than raw transaction data. This allows access to richer training data from diverse sources while avoiding regulatory and confidentiality issues around sharing sensitive financial customer data.\nThese examples demonstrate how federated learning provides tangible privacy benefits and enables collaborative ML in settings where direct data sharing is impossible.\n\n\n\n\n14.8.3 Machine Unlearning\n\nCore Idea\nMachine unlearning is a fairly new process that describes how the influence of a subset of training data can be removed from the model. Several methods have been used to perform machine unlearning and remove the influence of a subset of training data from the final model. A baseline approach might consist of simply fine-tuning the model for more epochs on just the data that should be remembered to decrease the influence of the data “forgotten” by the model. Since this approach doesn’t explicitly remove the influence of data that should be erased, membership inference attacks are still possible, so researchers have adopted other approaches to unlearn data from a model explicitly. One type of approach that researchers have adopted includes adjusting the model loss function to treat the losses of the “forget set explicitly” (data to be unlearned) and the “retain set” (remaining data that should still be remembered) differently (Tarun et al. 2022; Khan and Swaroop 2021).\n\nTarun, Ayush K, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli. 2022. “Deep Regression Unlearning.” ArXiv Preprint abs/2210.08196. https://arxiv.org/abs/2210.08196.\n\nKhan, Mohammad Emtiyaz, and Siddharth Swaroop. 2021. “Knowledge-Adaptation Priors.” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, 19757–70. https://proceedings.neurips.cc/paper/2021/hash/a4380923dd651c195b1631af7c829187-Abstract.html.\n\n\nCase Study\nSome researchers demonstrate a real-life example of machine unlearning approaches applied to SOTA machine learning models through training an LLM, LLaMA2-7b, to unlearn any references to Harry Potter (Eldan and Russinovich 2023). Though this model took 184K GPU hours to pre-train, it only took 1 GPU hour of fine-tuning to erase the model’s ability to generate or recall Harry Potter-related content without noticeably compromising the accuracy of generating content unrelated to Harry Potter. Figure 14.13 demonstrates how the model output changes before (Llama-7b-chat-hf column) and after (Finetuned Llama-b column) unlearning has occurred.\n\n\n\n\n\n\nFigure 14.13: Llama unlearning Harry Potter. Credit: Eldan and Russinovich (2023).\n\n\nEldan, Ronen, and Mark Russinovich. 2023. “Who’s Harry Potter? Approximate Unlearning in LLMs.” ArXiv Preprint abs/2310.02238. https://arxiv.org/abs/2310.02238.\n\n\n\n\nOther Uses\n\nRemoving adversarial data\nDeep learning models have previously been shown to be vulnerable to adversarial attacks, in which the attacker generates adversarial data similar to the original training data, where a human cannot tell the difference between the real and fabricated data. The adversarial data results in the model outputting incorrect predictions, which could have detrimental consequences in various applications, including healthcare diagnosis predictions. Machine unlearning has been used to unlearn the influence of adversarial data to prevent these incorrect predictions from occurring and causing any harm\n\n\n\n\n14.8.4 Homomorphic Encryption\n\nCore Idea\nHomomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, generating an encrypted result that, when decrypted, matches the result of operations performed on the plaintext. For example, multiplying two numbers encrypted with homomorphic encryption produces an encrypted product that decrypts the actual product of the two numbers. This means that data can be processed in an encrypted form, and only the resulting output needs to be decrypted, significantly enhancing data security, especially for sensitive information.\nHomomorphic encryption enables outsourced computation on encrypted data without exposing the data itself to the external party performing the operations. However, only certain computations like addition and multiplication are supported in partially homomorphic schemes. Fully homomorphic encryption (FHE) that can handle any computation is even more complex. The number of possible operations is limited before noise accumulation corrupts the ciphertext.\nTo use homomorphic encryption across different entities, carefully generated public keys must be exchanged for operations across separately encrypted data. This advanced encryption technique enables previously impossible secure computation paradigms but requires expertise to implement correctly for real-world systems.\n\n\nBenefits\nHomomorphic encryption enables machine learning model training and inference on encrypted data, ensuring that sensitive inputs and intermediate values remain confidential. This is critical in healthcare, finance, genetics, and other domains, which are increasingly relying on ML to analyze sensitive and regulated data sets containing billions of personal records.\nHomomorphic encryption thwarts attacks like model extraction and membership inference that could expose private data used in ML workflows. It provides an alternative to TEEs using hardware enclaves for confidential computing. However, current schemes have high computational overheads and algorithmic limitations that constrain real-world applications.\nHomomorphic encryption realizes the decades-old vision of secure multipartymultiparty computation by allowing computation on ciphertexts. Conceptualized in the 1970s, the first fully homomorphic cryptosystems emerged in 2009, enabling arbitrary computations. Ongoing research is making these techniques more efficient and practical.\nHomomorphic encryption shows great promise in enabling privacy-preserving machine learning under emerging data regulations. However, given constraints, one should carefully evaluate its applicability against other confidential computing approaches. Extensive resources exist to explore homomorphic encryption and track progress in easing adoption barriers.\n\n\nMechanics\n\nData Encryption: Before data is processed or sent to an ML model, it is encrypted using a homomorphic encryption scheme and public key. For example, encrypting numbers \\(x\\) and \\(y\\) generates ciphertexts \\(E(x)\\) and \\(E(y)\\).\nComputation on Ciphertext: The ML algorithm processes the encrypted data directly. For instance, multiplying the ciphertexts \\(E(x)\\) and \\(E(y)\\) generates \\(E(xy)\\). More complex model training can also be done on ciphertexts.\nResult Encryption: The result \\(E(xy)\\) remains encrypted and can only be decrypted by someone with the corresponding private key to reveal the actual product \\(xy\\).\n\nOnly authorized parties with the private key can decrypt the final outputs, protecting the intermediate state. However, noise accumulates with each operation, preventing further computation without decryption.\nBeyond healthcare, homomorphic encryption enables confidential computing for applications like financial fraud detection, insurance analytics, genetics research, and more. It offers an alternative to techniques like multipartymultiparty computation and TEEs. Ongoing research aims to improve the efficiency and capabilities.\nTools like HElib, SEAL, and TensorFlow HE provide libraries for exploring implementing homomorphic encryption in real-world machine learning pipelines.\n\n\nTradeoffs\nFor many real-time and embedded applications, fully homomorphic encryption remains impractical for the following reasons.\nComputational Overhead: Homomorphic encryption imposes very high computational overheads, often resulting in slowdowns of over 100x for real-world ML applications. This makes it impractical for many time-sensitive or resource-constrained uses. Optimized hardware and parallelization can help but not eliminate this issue.\nComplexity of Implementation The sophisticated algorithms require deep expertise in cryptography to be implemented correctly. Nuances like format compatibility with floating point ML models and scalable key management pose hurdles. This complexity hinders widespread practical adoption.\nAlgorithmic Limitations: Current schemes restrict the functions and depth of computations supported, limiting the models and data volumes that can be processed. Ongoing research is pushing these boundaries, but restrictions remain.\nHardware Acceleration: Homomorphic encryption requires specialized hardware, such as secure processors or coprocessors with TEEs, which adds design and infrastructure costs.\nHybrid Designs: Rather than encrypting entire workflows, selective application of homomorphic encryption to critical subcomponents can achieve protection while minimizing overheads.\n\n\n\n\n\n\nExercise 14.2: Homomorphic Encryption\n\n\n\n\n\nReady to unlock the power of encrypted computation? Homomorphic encryption is like a magic trick for your data! In this Colab, we’ll learn how to do calculations on secret numbers without ever revealing them. Imagine training a model on data you can’t even see – that’s the power of this mind-bending technology.\n\n\n\n\n\n\n\n14.8.5 Secure MultipartyMultiparty Communication\n\nCore Idea\nThe overarching goal of MPC is to enable different parties to jointly compute a function over their inputs while keeping those inputs private. For example, two organizations may want to collaborate on training a machine learning model by combining their respective data sets. Still, they cannot directly reveal that data due to Privacy or confidentiality constraints. MPC aims to provide protocols and techniques that allow them to achieve the benefits of pooled data for model accuracy without compromising the privacy of each organization’s sensitive data.\nAt a high level, MPC works by carefully splitting the computation into parts that each party can execute independently using their private input. The results are then combined to reveal only the final output of the function and nothing about the intermediate values. Cryptographic techniques are used to guarantee that the partial results remain private provably.\nLet’s take a simple example of an MPC protocol. One of the most basic MPC protocols is the secure addition of two numbers. Each party splits its input into random shares that are secretly distributed. They exchange the shares and locally compute the sum of the shares, which reconstructs the final sum without revealing the individual inputs. For example, if Alice has input x and Bob has input y:\n\nAlice generates random \\(x_1\\) and sets \\(x_2 = x - x_1\\)\nBob generates random \\(y_1\\) and sets \\(y_2 = y - y_1\\)\nAlice sends \\(x_1\\) to Bob, Bob sends \\(y_1\\) to Alice (keeping \\(x_2\\) and \\(y_2\\) secret)\nAlice computes \\(x_2 + y_1 = s_1\\), Bob computes \\(x_1 + y_2 = s_2\\)\n\\(s_1 + s_2 = x + y\\) is the final sum, without revealing \\(x\\) or \\(y\\).\n\nAlice’s and Bob’s individual inputs (\\(x\\) and \\(y\\)) remain private, and each party only reveals one number associated with their original inputs. The random spits ensure no information about the original numbers disclosed\nSecure Comparison: Another basic operation is a secure comparison of two numbers, determining which is greater than the other. This can be done using techniques like Yao’s Garbled Circuits, where the comparison circuit is encrypted to allow joint evaluation of the inputs without leaking them.\nSecure Matrix Multiplication: Matrix operations like multiplication are essential for machine learning. MPC techniques like additive secret sharing can be used to split matrices into random shares, compute products on the shares, and then reconstruct the result.\nSecure Model Training: Distributed machine learning training algorithms like federated averaging can be made secure using MPC. Model updates computed on partitioned data at each node are secretly shared between nodes and aggregated to train the global model without exposing individual updates.\nThe core idea behind MPC protocols is to divide the computation into steps that can be executed jointly without revealing intermediate sensitive data. This is accomplished by combining cryptographic techniques like secret sharing, homomorphic encryption, oblivious transfer, and garbled circuits. MPC protocols enable the collaborative computation of sensitive data while providing provable privacy guarantees. This privacy-preserving capability is essential for many machine learning applications today involving multiple parties that cannot directly share their raw data.\nThe main approaches used in MPC include:\n\nHomomorphic encryption: Special encryption allows computations to be carried out on encrypted data without decrypting it.\nSecret sharing: The private data is divided into random shares distributed to each party. Computations are done locally on the shares and finally reconstructed.\nOblivious transfer: A protocol where a receiver obtains a subset of data from a sender, but the sender does not know which specific data was transferred.\nGarbled circuits: The function to be computed is represented as a Boolean circuit that is encrypted (“garbled”) to allow joint evaluation without revealing inputs.\n\n\n\nTradeoffs\nWhile MPC protocols provide strong privacy guarantees, they come at a high computational cost compared to plain computations. Every secure operation, like addition, multiplication, comparison, etc., requires more processing orders than the equivalent unencrypted operation. This overhead stems from the underlying cryptographic techniques:\n\nIn partially homomorphic encryption, each computation on ciphertexts requires costly public-key operations. Fully homomorphic encryption has even higher overheads.\nSecret sharing divides data into multiple shares, so even basic operations require manipulating many shares.\nOblivious transfer and garbled circuits add masking and encryption to hide data access patterns and execution flows.\nMPC systems require extensive communication and interaction between parties to compute on shares/ciphertexts jointly.\n\nAs a result, MPC protocols can slow down computations by 3-4 orders of magnitude compared to plain implementations. This becomes prohibitively expensive for large datasets and models. Therefore, training machine learning models on encrypted data using MPC remains infeasible today for realistic dataset sizes due to the overhead. Clever optimizations and approximations are needed to make MPC practical.\nOngoing MPC research aims to close this efficiency gap through cryptographic advances, new algorithms, trusted hardware like SGX enclaves, and leveraging accelerators like GPUs/TPUs. However, in the foreseeable future, some degree of approximation and performance tradeoff is needed to scale MPC to meet the demands of real-world machine learning systems.\n\n\n\n14.8.6 Synthetic Data Generation\n\nCore Idea\nSynthetic data generation has emerged as an important privacy-preserving machine learning approach that allows models to be developed and tested without exposing real user data. The key idea is to train generative models on real-world datasets and then sample from these models to synthesize artificial data that statistically match the original data distribution but does not contain actual user information. For example, a GAN could be trained on a dataset of sensitive medical records to learn the underlying patterns and then used to sample synthetic patient data.\nThe primary challenge of synthesizing data is to ensure adversaries are unable to re-identify the original dataset. A simple approach to achieving synthetic data is adding noise to the original dataset, which still risks privacy leakage. When noise is added to data in the context of differential privacy, sophisticated mechanisms based on the data’s sensitivity are used to calibrate the amount and distribution of noise. Through these mathematically rigorous frameworks, differential Privacy generally guarantees Privacy at some level, which is the primary goal of this privacy-preserving technique. Beyond preserving privacy, synthetic data combats multiple data availability issues such as imbalanced datasets, scarce datasets, and anomaly detection.\nResearchers can freely share this synthetic data and collaborate on modeling without revealing private medical information. Well-constructed synthetic data protects Privacy while providing utility for developing accurate models. Key techniques to prevent reconstructing the original data include adding differential privacy noise during training, enforcing plausibility constraints, and using multiple diverse generative models. Here are some common approaches for generating synthetic data:\n\nGenerative Adversarial Networks (GANs): GANs are an AI algorithm used in unsupervised learning where two neural networks compete against each other in a game. Figure 14.14 is an overview of the GAN system. The generator network (big red box) is responsible for producing the synthetic data, and the discriminator network (yellow box) evaluates the authenticity of the data by distinguishing between fake data created by the generator network and the real data. The generator and discriminator networks learn and update their parameters based on the results. The discriminator acts as a metric on how similar the fake and real data are to one another. It is highly effective at generating realistic data and is a popular approach for generating synthetic data.\n\n\n\n\n\n\n\nFigure 14.14: Flowchart of GANs. Credit: Rosa and Papa (2021).\n\n\nRosa, Gustavo H. de, and João P. Papa. 2021. “A Survey on Text Generation Using Generative Adversarial Networks.” Pattern Recogn. 119 (November): 108098. https://doi.org/10.1016/j.patcog.2021.108098.\n\n\n\nVariational Autoencoders (VAEs): VAEs are neural networks capable of learning complex probability distributions and balancing data generation quality and computational efficiency. They encode data into a latent space where they learn the distribution to decode the data back.\nData Augmentation: This involves transforming existing data to create new, altered data. For example, flipping, rotating, and scaling (uniformly or non-uniformly) original images can help create a more diverse, robust image dataset before training an ML model.\nSimulations: Mathematical models can simulate real-world systems or processes to mimic real-world phenomena. This is highly useful in scientific research, urban planning, and economics.\n\n\n\nBenefits\nWhile synthetic data may be necessary due to Privacy or compliance risks, it is widely used in machine learning models when available data is of poor quality, scarce, or inaccessible. Synthetic data offers more efficient and effective development by streamlining robust model training, testing, and deployment processes. It allows researchers to share models more widely without breaching privacy laws and regulations. Collaboration between users of the same dataset will be facilitated, which will help broaden the capabilities and advancements in ML research.\nThere are several motivations for using synthetic data in machine learning:\n\nPrivacy and compliance: Synthetic data avoids exposing personal information, allowing more open sharing and collaboration. This is important when working with sensitive datasets like healthcare records or financial information.\nData scarcity: When insufficient real-world data is available, synthetic data can augment training datasets. This improves model accuracy when limited data is a bottleneck.\nModel testing: Synthetic data provides privacy-safe sandboxes for testing model performance, debugging issues, and monitoring for bias.\nData labeling: High-quality labeled training data is often scarce and expensive. Synthetic data can help auto-generate labeled examples.\n\n\n\nTradeoffs\nWhile synthetic data aims to remove any evidence of the original dataset, privacy leakage is still a risk since the synthetic data mimics the original data. The statistical information and distribution are similar, if not the same, between the original and synthetic data. By resampling from the distribution, adversaries may still be able to recover the original training samples. Due to their inherent learning processes and complexities, neural networks might accidentally reveal sensitive information about the original training data.\nA core challenge with synthetic data is the potential gap between synthetic and real-world data distributions. Despite advancements in generative modeling techniques, synthetic data may only partially capture real data’s complexity, diversity, and nuanced patterns. This can limit the utility of synthetic data for robustly training machine learning models. Rigorously evaluating synthetic data quality through adversary methods and comparing model performance to real data benchmarks helps assess and improve fidelity. However, inherently, synthetic data remains an approximation.\nAnother critical concern is the privacy risks of synthetic data. Generative models may leak identifiable information about individuals in the training data, which could enable reconstruction of private information. Emerging adversarial attacks demonstrate the challenges in preventing identity leakage from synthetic data generation pipelines. Techniques like differential Privacy can help safeguard Privacy but come with tradeoffs in data utility. There is an inherent tension between producing useful synthetic data and fully protecting sensitive training data, which must be balanced.\nAdditional pitfalls of synthetic data include amplified biases, labeling difficulties, the computational overhead of training generative models, storage costs, and failure to account for out-of-distribution novel data. While these are secondary to the core synthetic-real gap and privacy risks, they remain important considerations when evaluating the suitability of synthetic data for particular machine-learning tasks. As with any technique, the advantages of synthetic data come with inherent tradeoffs and limitations that require thoughtful mitigation strategies.\n\n\n\n14.8.7 Summary\nWhile all the techniques we have discussed thus far aim to enable privacy-preserving machine learning, they involve distinct mechanisms and tradeoffs. Factors like computational constraints, required trust assumptions, threat models, and data characteristics help guide the selection process for a particular use case. However, finding the right balance between Privacy, accuracy, and efficiency necessitates experimentation and empirical evaluation for many applications. Below is a comparison table of the key privacy-preserving machine learning techniques and their pros and cons:\n\n\n\n\n\n\n\n\nTechnique\nPros\nCons\n\n\n\n\nDifferential Privacy\nStrong formal privacy guarantees Robust to auxiliary data attacks Versatile for many data types and analyses\nAccuracy loss from noise addition Computational overhead for sensitivity analysis and noise generation\n\n\nFederated Learning\nAllows collaborative learning without sharing raw data Data remains decentralized improving security No need for encrypted computation\nIncreased communication overhead Potentially slower model convergence Uneven client device capabilities\n\n\nSecure Multi-Party Computation\nEnables joint computation on sensitive data Provides cryptographic privacy guarantees Flexible protocols for various functions\nVery high computational overhead Complexity of implementation Algorithmic constraints on function depth\n\n\nHomomorphic Encryption\nAllows computation on encrypted data Prevents intermediate state exposure\nExtremely high computational cost Complex cryptographic implementations Restrictions on function types\n\n\nSynthetic Data Generation\nEnables data sharing without leakage Mitigates data scarcity problems\nSynthetic-real gap in distributions Potential for reconstructing private data Biases and labeling challenges", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1584,7 +1584,7 @@ "section": "14.9 Conclusion", "text": "14.9 Conclusion\nMachine learning hardware security is critical as embedded ML systems are increasingly deployed in safety-critical domains like medical devices, industrial controls, and autonomous vehicles. We have explored various threats spanning hardware bugs, physical attacks, side channels, supply chain risks, etc. Defenses like TEEs, Secure Boot, PUFs, and hardware security modules provide multilayer protection tailored for resource-constrained embedded devices.\nHowever, continual vigilance is essential to track emerging attack vectors and address potential vulnerabilities through secure engineering practices across the hardware lifecycle. As ML and embedded ML spread, maintaining rigorous security foundations that match the field’s accelerating pace of innovation remains imperative.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1593,9 +1593,9 @@ "href": "contents/privacy_security/privacy_security.html#sec-security-and-privacy-resource", "title": "14  Security & Privacy", "section": "14.10 Resources", - "text": "14.10 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nSecurity.\nPrivacy.\nMonitoring after Deployment.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 14.1\nExercise 14.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "14.10 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nSecurity.\nPrivacy.\nMonitoring after Deployment.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 14.1\nVideo 14.2\nVideo 14.3\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 14.1\nExercise 14.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Advanced Topics", "14  Security & Privacy" ] }, @@ -1606,7 +1606,7 @@ "section": "", "text": "15.1 Introduction\nMachine learning models are increasingly used to automate decisions in high-stakes social domains like healthcare, criminal justice, and employment. However, without deliberate care, these algorithms can perpetuate biases, breach privacy, or cause other harm. For instance, a loan approval model solely trained on data from high-income neighborhoods could disadvantage applicants from lower-income areas. This motivates the need for responsible machine learning - creating fair, accountable, transparent, and ethical models.\nSeveral core principles underlie responsible ML. Fairness ensures models do not discriminate based on gender, race, age, and other attributes. Explainability enables humans to interpret model behaviors and improve transparency. Robustness and safety techniques prevent vulnerabilities like adversarial examples. Rigorous testing and validation help reduce unintended model weaknesses or side effects.\nImplementing responsible ML presents both technical and ethical challenges. Developers must grapple with defining fairness mathematically, balancing competing objectives like accuracy vs interpretability, and securing quality training data. Organizations must also align incentives, policies, and culture to uphold ethical AI.\nThis chapter will equip you to critically evaluate AI systems and contribute to developing beneficial and ethical machine learning applications by covering the foundations, methods, and real-world implications of responsible ML. The responsible ML principles discussed are crucial knowledge as algorithms mediate more aspects of human society.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1617,7 +1617,7 @@ "section": "15.2 Definition", "text": "15.2 Definition\nResponsible AI is about developing AI that positively impacts society under human ethics and values. There is no universally agreed-upon definition of “responsible AI,” but here is a summary of how it is commonly described. Responsible AI refers to designing, developing, and deploying artificial intelligence systems in an ethical, socially beneficial way. The core goal is to create trustworthy, unbiased, fair, transparent, accountable, and safe AI. While there is no canonical definition, responsible AI is generally considered to encompass principles such as:\n\nFairness: Avoiding biases, discrimination, and potential harm to certain groups or populations\nExplainability: Enabling humans to understand and interpret how AI models make decisions\nTransparency: Openly communicating how AI systems operate, are built, and are evaluated\nAccountability: Having processes to determine responsibility and liability for AI failures or negative impacts\nRobustness: Ensuring AI systems are secure, reliable, and behave as intended\nPrivacy: Protecting sensitive user data and adhering to privacy laws and ethics\n\nPutting these principles into practice involves technical techniques, corporate policies, governance frameworks, and moral philosophy. There are also ongoing debates around defining ambiguous concepts like fairness and determining how to balance competing objectives.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1626,9 +1626,9 @@ "href": "contents/responsible_ai/responsible_ai.html#principles-and-concepts", "title": "15  Responsible AI", "section": "15.3 Principles and Concepts", - "text": "15.3 Principles and Concepts\n\n15.3.1 Transparency and Explainability\nMachine learning models are often criticized as mysterious “black boxes” - opaque systems where it’s unclear how they arrived at particular predictions or decisions. For example, an AI system called COMPAS used to assess criminal recidivism risk in the US was found to be racially biased against black defendants. Still, the opacity of the algorithm made it difficult to understand and fix the problem. This lack of transparency can obscure biases, errors, and deficiencies.\nExplaining model behaviors helps engender trust from the public and domain experts and enables identifying issues to address. Interpretability techniques like LIME, Shapley values, and saliency maps empower humans to understand and validate model logic. Laws like the EU’s GDPR also mandate transparency, which requires explainability for certain automated decisions. Overall, transparency and explainability are critical pillars of responsible AI.\n\n\n15.3.2 Fairness, Bias, and Discrimination\nML models trained on historically biased data often perpetuate and amplify those prejudices. Healthcare algorithms have been shown to disadvantage black patients by underestimating their needs (Obermeyer et al. 2019). Facial recognition needs to be more accurate for women and people of color. Such algorithmic discrimination can negatively impact people’s lives in profound ways.\n\nObermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–53. https://doi.org/10.1126/science.aax2342.\nDifferent philosophical perspectives also exist on fairness - for example, is it fairer to treat all individuals equally or try to achieve equal outcomes for groups? Ensuring fairness requires proactively detecting and mitigating biases in data and models. However, achieving perfect fairness is tremendously difficult due to contrasting mathematical definitions and ethical perspectives. Still, promoting algorithmic fairness and non-discrimination is a key responsibility in AI development.\n\n\n15.3.3 Privacy and Data Governance\nMaintaining individuals’ privacy is an ethical obligation and legal requirement for organizations deploying AI systems. Regulations like the EU’s GDPR mandate data privacy protections and rights, such as the ability to access and delete one’s data.\nHowever, maximizing the utility and accuracy of data for training models can conflict with preserving privacy - modeling disease progression could benefit from access to patients’ full genomes, but sharing such data widely violates privacy.\nResponsible data governance involves carefully anonymizing data, controlling access with encryption, getting informed consent from data subjects, and collecting the minimum data needed. Honoring privacy is challenging but critical as AI capabilities and adoption expand.\n\n\n15.3.4 Safety and Robustness\nPutting AI systems into real-world operation requires ensuring they are safe, reliable, and robust, especially for human interaction scenarios. Self-driving cars from Uber and Tesla have been involved in deadly crashes due to unsafe behaviors.\nAdversarial attacks that subtly alter input data can also fool ML models and cause dangerous failures if systems are not resistant. Deepfakes represent another emerging threat area.\nBelow is a deepfake video of Barack Obama that went viral a few years ago.\n\nPromoting safety requires extensive testing, risk analysis, human oversight, and designing systems that combine multiple weak models to avoid single points of failure. Rigorous safety mechanisms are essential for the responsible deployment of capable AI.\n\n\n15.3.5 Accountability and Governance\nWhen AI systems eventually fail or produce harmful outcomes, mechanisms must exist to address resultant issues, compensate affected parties, and assign responsibility. Both corporate accountability policies and government regulations are indispensable for responsible AI governance. For instance, Illinois’ Artificial Intelligence Video Interview Act requires companies to disclose and obtain consent for AI video analysis, promoting accountability.\nWithout clear accountability, even harms caused unintentionally could go unresolved, furthering public outrage and distrust. Oversight boards, impact assessments, grievance redress processes, and independent audits promote responsible development and deployment.", + "text": "15.3 Principles and Concepts\n\n15.3.1 Transparency and Explainability\nMachine learning models are often criticized as mysterious “black boxes” - opaque systems where it’s unclear how they arrived at particular predictions or decisions. For example, an AI system called COMPAS used to assess criminal recidivism risk in the US was found to be racially biased against black defendants. Still, the opacity of the algorithm made it difficult to understand and fix the problem. This lack of transparency can obscure biases, errors, and deficiencies.\nExplaining model behaviors helps engender trust from the public and domain experts and enables identifying issues to address. Interpretability techniques like LIME, Shapley values, and saliency maps empower humans to understand and validate model logic. Laws like the EU’s GDPR also mandate transparency, which requires explainability for certain automated decisions. Overall, transparency and explainability are critical pillars of responsible AI.\n\n\n15.3.2 Fairness, Bias, and Discrimination\nML models trained on historically biased data often perpetuate and amplify those prejudices. Healthcare algorithms have been shown to disadvantage black patients by underestimating their needs (Obermeyer et al. 2019). Facial recognition needs to be more accurate for women and people of color. Such algorithmic discrimination can negatively impact people’s lives in profound ways.\n\nObermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations.” Science 366 (6464): 447–53. https://doi.org/10.1126/science.aax2342.\nDifferent philosophical perspectives also exist on fairness - for example, is it fairer to treat all individuals equally or try to achieve equal outcomes for groups? Ensuring fairness requires proactively detecting and mitigating biases in data and models. However, achieving perfect fairness is tremendously difficult due to contrasting mathematical definitions and ethical perspectives. Still, promoting algorithmic fairness and non-discrimination is a key responsibility in AI development.\n\n\n15.3.3 Privacy and Data Governance\nMaintaining individuals’ privacy is an ethical obligation and legal requirement for organizations deploying AI systems. Regulations like the EU’s GDPR mandate data privacy protections and rights, such as the ability to access and delete one’s data.\nHowever, maximizing the utility and accuracy of data for training models can conflict with preserving privacy - modeling disease progression could benefit from access to patients’ full genomes, but sharing such data widely violates privacy.\nResponsible data governance involves carefully anonymizing data, controlling access with encryption, getting informed consent from data subjects, and collecting the minimum data needed. Honoring privacy is challenging but critical as AI capabilities and adoption expand.\n\n\n15.3.4 Safety and Robustness\nPutting AI systems into real-world operation requires ensuring they are safe, reliable, and robust, especially for human interaction scenarios. Self-driving cars from Uber and Tesla have been involved in deadly crashes due to unsafe behaviors.\nAdversarial attacks that subtly alter input data can also fool ML models and cause dangerous failures if systems are not resistant. Deepfakes represent another emerging threat area.\nVideo 15.1 is a deepfake video of Barack Obama that went viral a few years ago.\n\n\n\n\n\n\nVideo 15.1: Fake Obama\n\n\n\n\n\n\nPromoting safety requires extensive testing, risk analysis, human oversight, and designing systems that combine multiple weak models to avoid single points of failure. Rigorous safety mechanisms are essential for the responsible deployment of capable AI.\n\n\n15.3.5 Accountability and Governance\nWhen AI systems eventually fail or produce harmful outcomes, mechanisms must exist to address resultant issues, compensate affected parties, and assign responsibility. Both corporate accountability policies and government regulations are indispensable for responsible AI governance. For instance, Illinois’ Artificial Intelligence Video Interview Act requires companies to disclose and obtain consent for AI video analysis, promoting accountability.\nWithout clear accountability, even harms caused unintentionally could go unresolved, furthering public outrage and distrust. Oversight boards, impact assessments, grievance redress processes, and independent audits promote responsible development and deployment.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1639,7 +1639,7 @@ "section": "15.4 Cloud, Edge & Tiny ML", "text": "15.4 Cloud, Edge & Tiny ML\nWhile these principles broadly apply across AI systems, certain responsible AI considerations are unique or pronounced when dealing with machine learning on embedded devices versus traditional server-based modeling. Therefore, we present a high-level taxonomy comparing responsible AI considerations across cloud, edge, and TinyML systems.\n\n15.4.1 Summary\nThe table below summarizes how responsible AI principles manifest differently across cloud, edge, and TinyML architectures and how core considerations tie into their unique capabilities and limitations. Each environment’s constraints and tradeoffs shape how we approach transparency, accountability, governance, and other pillars of responsible AI.\n\n\n\n\n\n\n\n\n\nPrinciple\nCloud ML\nEdge ML\nTinyML\n\n\n\n\nExplainability\nComplex models supported\nLightweight required\nSevere limits\n\n\nFairness\nBroad data available\nOn-device biases\nLimited data labels\n\n\nPrivacy\nCloud data vulnerabilities\nMore sensitive data\nData dispersed\n\n\nSafety\nHacking threats\nReal-world interaction\nAutonomous devices\n\n\nAccountability\nCorporate policies\nSupply chain issues\nComponent tracing\n\n\nGovernance\nExternal oversight feasible\nSelf-governance needed\nProtocol constraints\n\n\n\n\n\n15.4.2 Explainability\nFor cloud-based machine learning, explainability techniques can leverage significant compute resources, enabling complex methods like SHAP values or sampling-based approaches to interpret model behaviors. For example, Microsoft’s InterpretML toolkit provides explainability techniques tailored for cloud environments.\nHowever, edge ML operates on resource-constrained devices, requiring more lightweight explainability methods that can run locally without excessive latency. Techniques like LIME (Ribeiro, Singh, and Guestrin 2016) approximate model explanations using linear models or decision trees to avoid expensive computations, which makes them ideal for resource-constrained devices. However, LIME requires training hundreds to even thousands of models to generate good explanations, which is often infeasible given edge computing constraints. In contrast, saliency-based methods are often much faster in practice, only requiring a single forward pass through the network to estimate feature importance. This greater efficiency makes such methods better suited to edge devices with limited compute resources where low-latency explanations are critical.\nGiven tiny hardware capabilities, embedded systems pose the most significant challenges for explainability. More compact models and limited data make inherent model transparency easier. Explaining decisions may not be feasible on high-size and power-optimized microcontrollers. DARPA’s Transparent Computing program aims to develop extremely low overhead explainability, especially for TinyML devices like sensors and wearables.\n\n\n15.4.3 Fairness\nFor cloud machine learning, vast datasets and computing power enable detecting biases across large heterogeneous populations and mitigating them through techniques like re-weighting data samples. However, biases may emerge from the broad behavioral data used to train cloud models. Amazon’s Fairness Flow framework helps assess cloud ML fairness.\nEdge ML relies on limited on-device data, making analyzing biases across diverse groups harder. However, edge devices interact closely with individuals, providing an opportunity to adapt locally for fairness. Google’s Federated Learning distributes model training across devices to incorporate individual differences.\nTinyML poses unique challenges for fairness with highly dispersed specialized hardware and minimal training data. Bias testing is difficult across diverse devices. Collecting representative data from many devices to mitigate bias has scale and privacy hurdles. DARPA’s Assured Neuro Symbolic Learning and Reasoning (ANSR) efforts are geared toward developing fairness techniques given extreme hardware constraints.\n\n\n15.4.4 Safety\nKey safety risks for cloud ML include model hacking, data poisoning, and malware disrupting cloud services. Robustness techniques like adversarial training, anomaly detection, and diversified models aim to harden cloud ML against attacks. Redundancy can help prevent single points of failure.\nEdge ML and TinyML interact with the physical world, so reliability and safety validation are critical. Rigorous testing platforms like Foretellix synthetically generate edge scenarios to validate safety. TinyML safety is magnified by autonomous devices with limited supervision. TinyML safety often relies on collective coordination - swarms of drones maintain safety through redundancy. Physical control barriers also constrain unsafe TinyML device behaviors.\nIn summary, safety is crucial but manifests differently in each domain. Cloud ML guards against hacking, edge ML interacts physically, so reliability is key, and TinyML leverages distributed coordination for safety. Understanding the nuances guides appropriate safety techniques.\n\n\n15.4.5 Accountability\nCloud ML’s accountability centers on corporate practices like responsible AI committees, ethical charters, and processes to address harmful incidents. Third-party audits and external government oversight promote cloud ML accountability.\nEdge ML accountability is more complex with distributed devices and supply chain fragmentation. Companies are accountable for devices, but components come from various vendors. Industry standards help coordinate edge ML accountability across stakeholders.\nWith TinyML, accountability mechanisms must be traced across long, complex supply chains of integrated circuits, sensors, and other hardware. TinyML certification schemes help track component provenance. Trade associations should ideally promote shared accountability for ethical TinyML.\n\n\n15.4.6 Governance\nOrganizations institute internal governance for cloud ML, such as ethics boards, audits, and model risk management. But external governance also oversees cloud ML, like regulations on bias and transparency such as the AI Bill of Rights, General Data Protection Regulation (GDPR), and California Consumer Protection Act (CCPA). Third-party auditing supports cloud ML governance.\nEdge ML is more decentralized, requiring responsible self-governance by developers and companies deploying models locally. Industry associations coordinate governance across edge ML vendors, and open software helps align incentives for ethical edge ML.\nExtreme decentralization and complexity make external governance infeasible with TinyML. TinyML relies on protocols and standards for self-governance baked into model design and hardware. Cryptography enables the provable trustworthiness of TinyML devices.\n\n\n15.4.7 Privacy\nFor cloud ML, vast amounts of user data are concentrated in the cloud, creating risks of exposure through breaches. Differential privacy techniques add noise to cloud data to preserve privacy. Strict access controls and encryption protect cloud data at rest and in transit.\nEdge ML moves data processing onto user devices, reducing aggregated data collection but increasing potential sensitivity as personal data resides on the device. Apple uses on-device ML and differential privacy to train models while minimizing data sharing. Data anonymization and secure enclaves protect on-device data.\nTinyML distributes data across many resource-constrained devices, making centralized breaches unlikely and making scale anonymization challenging. Data minimization and using edge devices as intermediaries help TinyML privacy.\nSo, while cloud ML must protect expansive centralized data, edge ML secures sensitive on-device data, and TinyML aims for minimal distributed data sharing due to constraints. While privacy is vital throughout, techniques must match the environment. Understanding nuances allows for selecting appropriate privacy preservation approaches.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1650,7 +1650,7 @@ "section": "15.5 Technical Aspects", "text": "15.5 Technical Aspects\n\n15.5.1 Detecting and Mitigating Bias\nA large body of work has demonstrated that machine learning models can exhibit bias, from underperforming people of a certain identity to making decisions that limit groups’ access to important resources (Buolamwini and Gebru 2018).\nEnsuring fair and equitable treatment for all groups affected by machine learning systems is crucial as these models increasingly impact people’s lives in areas like lending, healthcare, and criminal justice. We typically evaluate model fairness by considering “subgroup attributes” unrelated to the prediction task that capture identities like race, gender, or religion. For example, in a loan default prediction model, subgroups could include race, gender, or religion. When models are trained naively to maximize accuracy, they often ignore subgroup performance. However, this can negatively impact marginalized communities.\nTo illustrate, imagine a model predicting loan repayment where the plusses (+’s) represent repayment and the circles (O’s) represent default, as shown in Figure 15.1. The optimal accuracy would be correctly classifying all of Group A while misclassifying some of Group B’s creditworthy applicants as defaults. If positive classifications allow access loans, Group A would receive many more loans—which would naturally result in a biased outcome.\n\n\n\n\n\n\nFigure 15.1: Fairness and accuracy.\n\n\n\nAlternatively, correcting the biases against Group B would likely increase “false positives” and reduce accuracy for Group A. Or, we could train separate models focused on maximizing true positives for each group. However, this would require explicitly using sensitive attributes like race in the decision process.\nAs we see, there are inherent tensions around priorities like accuracy versus subgroup fairness and whether to explicitly account for protected classes. Reasonable people can disagree on the appropriate tradeoffs. Constraints around costs and implementation options further complicate matters. Overall, ensuring the fair and ethical use of machine learning involves navigating these complex challenges.\nThus, the fairness literature has proposed three main fairness metrics for quantifying how fair a model performs over a dataset (Hardt, Price, and Srebro 2016). Given a model h and a dataset D consisting of (x,y,s) samples, where x is the data features, y is the label, and s is the subgroup attribute, and we assume there are simply two subgroups a and b, we can define the following.\n\nDemographic Parity asks how accurate a model is for each subgroup. In other words, P(h(X) = Y S = a) = P(h(X) = Y S = b)\nEqualized Odds asks how precise a model is on positive and negative samples for each subgroup. P(h(X) = y S = a, Y = y) = P(h(X) = y S = b, Y = y)\nEquality of Opportunity is a special case of equalized odds that only asks how precise a model is on positive samples. This is relevant in cases such as resource allocation, where we care about how positive (i.e., resource-allocated) labels are distributed across groups. For example, we care that an equal proportion of loans are given to both men and women. P(h(X) = 1 S = a, Y = 1) = P(h(X) = 1 S = b, Y = 1)\n\nNote: These definitions often take a narrow view when considering binary comparisons between two subgroups. Another thread of fair machine learning research focusing on multicalibration and multiaccuracy considers the interactions between an arbitrary number of identities, acknowledging the inherent intersectionality of individual identities in the real world (Hébert-Johnson et al. 2018).\n\nHébert-Johnson, Úrsula, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2018. “Multicalibration: Calibration for the (Computationally-Identifiable) Masses.” In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, edited by Jennifer G. Dy and Andreas Krause, 80:1944–53. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/hebert-johnson18a.html.\n\nContext Matters\nBefore making any technical decisions to develop an unbiased ML algorithm, we need to understand the context surrounding our model. Here are some of the key questions to think about:\n\nWho will this model make decisions for?\nWho is represented in the training data?\nWho is represented, and who is missing at the table of engineers, designers, and managers?\n\nWhat sort of long-lasting impacts could this model have? For example, will it impact an individual’s financial security at a generational scale, such as determining college admissions or admitting a loan for a house?\n\nWhat historical and systematic biases are present in this setting, and are they present in the training data the model will generalize from?\n\nUnderstanding a system’s social, ethical, and historical background is critical to preventing harm and should inform decisions throughout the model development lifecycle. After understanding the context, one can make various technical decisions to remove bias. First, one must decide what fairness metric is the most appropriate criterion for optimizing. Next, there are generally three main areas where one can intervene to debias an ML system.\nFirst, preprocessing is when one balances a dataset to ensure fair representation or even increases the weight on certain underrepresented groups to ensure the model performs well. Second, in processing attempts to modify the training process of an ML system to ensure it prioritizes fairness. This can be as simple as adding a fairness regularizer (Lowy et al. 2021) to training an ensemble of models and sampling from them in a specific manner (Agarwal et al. 2018).\n\nLowy, Andrew, Rakesh Pavan, Sina Baharlouei, Meisam Razaviyayn, and Ahmad Beirami. 2021. “Fermi: Fair Empirical Risk Minimization via Exponential Rényi Mutual Information.”\n\nAgarwal, Alekh, Alina Beygelzimer, Miroslav Dudı́k, John Langford, and Hanna M. Wallach. 2018. “A Reductions Approach to Fair Classification.” In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, edited by Jennifer G. Dy and Andreas Krause, 80:60–69. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/agarwal18a.html.\n\nAlghamdi, Wael, Hsiang Hsu, Haewon Jeong, Hao Wang, Peter Michalak, Shahab Asoodeh, and Flavio Calmon. 2022. “Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection.” Adv. Neur. In. 35: 38747–60.\n\nHardt, Moritz, Eric Price, and Nati Srebro. 2016. “Equality of Opportunity in Supervised Learning.” In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, 3315–23. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html.\nFinally, post-processing debases a model after the fact, taking a trained model and modifying its predictions in a specific manner to ensure fairness is preserved (Alghamdi et al. 2022; Hardt, Price, and Srebro 2016). Post-processing builds on the preprocessing and in-processing steps by providing another opportunity to address bias and fairness issues in the model after it has already been trained.\nThe three-step process of preprocessing, in-processing, and post-processing provides a framework for intervening at different stages of model development to mitigate issues around bias and fairness. While preprocessing and in-processing focus on data and training, post-processing allows for adjustments after the model has been fully trained. Together, these three approaches give multiple opportunities to detect and remove unfair bias.\n\n\nThoughtful Deployment\nThe breadth of existing fairness definitions and debiasing interventions underscores the need for thoughtful assessment before deploying ML systems. As ML researchers and developers, responsible model development requires proactively educating ourselves on the real-world context, consulting domain experts and end-users, and centering harm prevention.\nRather than seeing fairness considerations as a box to check, we must deeply engage with the unique social implications and ethical tradeoffs around each model we build. Every technical choice about datasets, model architectures, evaluation metrics, and deployment constraints embeds values. By broadening our perspective beyond narrow technical metrics, carefully evaluating tradeoffs, and listening to impacted voices, we can work to ensure our systems expand opportunity rather than encode bias.\nThe path forward lies not in an arbitrary debiasing checklist but in a commitment to understanding and upholding our ethical responsibility at each step. This commitment starts with proactively educating ourselves and consulting others rather than just going through the motions of a fairness checklist. It requires engaging deeply with ethical tradeoffs in our technical choices, evaluating impacts on different groups, and listening to those voices most impacted.\nUltimately, responsible and ethical AI systems do not come from checkbox debiasing but from upholding our duty to assess harms, broaden perspectives, understand tradeoffs, and ensure we provide opportunity for all groups. This ethical responsibility should drive every step.\nThe connection between the paragraphs is that the first paragraph establishes the need for a thoughtful assessment of fairness issues rather than a checkbox approach. The second paragraph then expands on what that thoughtful assessment looks like in practice—engaging with tradeoffs, evaluating impacts on groups, and listening to impacted voices. Finally, the last paragraph refers to avoiding an “arbitrary debiasing checklist” and committing to ethical responsibility through assessment, understanding tradeoffs, and providing opportunity.\n\n\n\n15.5.2 Preserving Privacy\nRecent incidents have demonstrated how AI models can memorize sensitive user data in ways that violate privacy. For example, as shown in Figure XXX below, Stable Diffusion’s art generations were found to mimic identifiable artists’ styles and replicate existing photos, concerning many (Ippolito et al. 2023). These risks are amplified with personalized ML systems deployed in intimate environments like homes or wearables.\nImagine if a smart speaker uses our conversations to improve the quality of service to end users who genuinely want it. Still, others could violate privacy by trying to extract what the speaker “remembers.” Figure 15.2 below shows how diffusion models can memorize and generate individual training examples (Ippolito et al. 2023).\n\n\n\n\n\n\nFigure 15.2: Diffusion models memorizing samples from training data. Credit: Ippolito et al. (2023).\n\n\nIppolito, Daphne, Florian Tramer, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher Choquette Choo, and Nicholas Carlini. 2023. “Preventing Generation of Verbatim Memorization in Language Models Gives a False Sense of Privacy.” In Proceedings of the 16th International Natural Language Generation Conference, 5253–70. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.inlg-main.3.\n\n\nAdversaries can use these memorization capabilities and train models to detect if specific training data influenced a target model. For example, membership inference attacks train a secondary model that learns to detect a change in the target model’s outputs when making inferences over data it was trained on versus not trained on (Shokri et al. 2017).\n\nShokri, Reza, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. “Membership Inference Attacks Against Machine Learning Models.” In 2017 IEEE Symposium on Security and Privacy (SP), 3–18. IEEE; IEEE. https://doi.org/10.1109/sp.2017.41.\n\nAbadi, Martin, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. “Deep Learning with Differential Privacy.” In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–18. CCS ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2976749.2978318.\nML devices are especially vulnerable because they are often personalized on user data and are deployed in even more intimate settings such as the home. Private machine learning techniques have evolved to establish safeguards against adversaries, as mentioned in the Security and Privacy chapter to combat these privacy issues. Methods like differential privacy add mathematical noise during training to obscure individual data points’ influence on the model. Popular techniques like DP-SGD (Abadi et al. 2016) also clip gradients to limit what the model leaks about the data. Still, users should also be able to delete the impact of their data after the fact.\n\n\n15.5.3 Machine Unlearning\nWith ML devices personalized to individual users and then deployed to remote edges without connectivity, a challenge arises—how can models responsively “forget” data points after deployment? If users request their data be removed from a personalized model, the lack of connectivity makes retraining infeasible. Thus, efficient on-device data forgetting is necessary but poses hurdles.\nInitial unlearning approaches faced limitations in this context. Given the resource constraints, retrieving models from scratch on the device to forget data points proves inefficient or even impossible. Fully retraining also requires retaining all the original training data on the device, which brings its own security and privacy risks. Common machine unlearning techniques (Bourtoule et al. 2021) for remote embedded ML systems fail to enable responsive, secure data removal.\n\nBourtoule, Lucas, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. “Machine Unlearning.” In 2021 IEEE Symposium on Security and Privacy (SP), 141–59. IEEE; IEEE. https://doi.org/10.1109/sp40001.2021.00019.\nHowever, newer methods show promise in modifying models to approximately forget data [?] without full retraining. While the accuracy loss from avoiding full rebuilds is modest, guaranteeing data privacy should still be the priority when handling sensitive user information ethically. Even slight exposure to private data can violate user trust. As ML systems become deeply personalized, efficiency and privacy must be enabled from the start—not afterthoughts.\nRecent policy discussions which include the European Union’s General Data, Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), the Act on the Protection of Personal Information (APPI), and Canada’s proposed Consumer Privacy Protection Act (CPPA), require the deletion of private information. These policies, coupled with AI incidents like Stable Diffusion memorizing artist data, have underscored the ethical need for users to delete their data from models after training.\nThe right to remove data arises from privacy concerns around corporations or adversaries misusing sensitive user information. Machine unlearning refers to removing the influence of specific points from an already-trained model. Naively, this involves full retraining without the deleted data. However, connectivity constraints often make retraining infeasible for ML systems personalized and deployed to remote edges. If a smart speaker learns from private home conversations, retaining access to delete that data is important.\nAlthough limited, methods are evolving to enable efficient approximations of retraining for unlearning. By modifying models’ inference time, they can mimic “forgetting” data without full access to training data. However, most current techniques are restricted to simple models, still have resource costs, and trade some accuracy. Though methods are evolving, enabling efficient data removal and respecting user privacy remains imperative for responsible TinyML deployment.\n\n\n15.5.4 Adversarial Examples and Robustness\nMachine learning models, especially deep neural networks, have a well-documented Achilles heel: they often break when even tiny perturbations are made to their inputs (Szegedy et al. 2014). This surprising fragility highlights a major robustness gap threatening real-world deployment in high-stakes domains. It also opens the door for adversarial attacks designed to fool models deliberately.\nMachine learning models can exhibit surprising brittleness—minor input tweaks can cause shocking malfunctions, even in state-of-the-art deep neural networks (Szegedy et al. 2014). This unpredictability around out-of-sample data underscores gaps in model generalization and robustness. Given the growing ubiquity of ML, it also enables adversarial threats that weaponize models’ blindspots.\nDeep neural networks demonstrate an almost paradoxical dual nature - human-like proficiency in training distributions coupled with extreme fragility to tiny input perturbations (Szegedy et al. 2014). This adversarial vulnerability gap highlights gaps in standard ML procedures and threats to real-world reliability. At the same time, it can be exploited: attackers can find model-breaking points humans wouldn’t perceive.\n\nSzegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. “Intriguing Properties of Neural Networks.” In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, edited by Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1312.6199.\nFigure 15.3 includes an example of a small meaningless perturbation that changes a model prediction. This fragility has real-world impacts: lack of robustness undermines trust in deploying models for high-stakes applications like self-driving cars or medical diagnosis. Moreover, the vulnerability leads to security threats: attackers can deliberately craft adversarial examples that are perceptually indistinguishable from normal data but cause model failures.\n\n\n\n\n\n\nFigure 15.3: Perturbation effect on prediction. Credit: Microsoft.\n\n\n\nFor instance, past work shows successful attacks that trick models for tasks like NSFW detection (Bhagoji et al. 2018), ad-blocking (Tramèr et al. 2019), and speech recognition (Carlini et al. 2016). While errors in these domains already pose security risks, the problem extends beyond IT security. Recently, adversarial robustness has been proposed as an additional performance metric by approximating worst-case behavior.\n\nBhagoji, Arjun Nitin, Warren He, Bo Li, and Dawn Song. 2018. “Practical Black-Box Attacks on Deep Neural Networks Using Efficient Query Mechanisms.” In Proceedings of the European Conference on Computer Vision (ECCV), 154–69.\n\nTramèr, Florian, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan Boneh. 2019. “AdVersarial: Perceptual Ad Blocking Meets Adversarial Machine Learning.” In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2005–21. ACM. https://doi.org/10.1145/3319535.3354222.\n\nCarlini, Nicholas, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. “Hidden Voice Commands.” In 25th USENIX Security Symposium (USENIX Security 16), 513–30.\nThe surprising model fragility highlighted above casts doubt on real-world reliability and opens the door to adversarial manipulation. This growing vulnerability underscores several needs. First, moral robustness evaluations are essential for quantifying model vulnerabilities before deployment. Approximating worst-case behavior surfaces blindspots.\nSecond, effective defenses across domains must be developed to close these robustness gaps. With security on the line, developers cannot ignore the threat of attacks exploiting model weaknesses. Moreover, we cannot afford any fragility-induced failures for safety-critical applications like self-driving vehicles and medical diagnosis. Lives are at stake.\nFinally, the research community continues mobilizing rapidly in response. Interest in adversarial machine learning has exploded as attacks reveal the need to bridge the robustness gap between synthetic and real-world data. Conferences now commonly feature defenses for securing and stabilizing models. The community recognizes that model fragility is a critical issue that must be addressed through robustness testing, defense development, and ongoing research. By surfacing blindspots and responding with principled defenses, we can work to ensure reliability and safety for machine learning systems, especially in high-stakes domains.\n\n\n15.5.5 Building Interpretable Models\nAs models are deployed more frequently in high-stakes settings, practitioners, developers, downstream end-users, and increasing regulation have highlighted the need for explainability in machine learning. The goal of many interpretability and explainability methods is to provide practitioners with more information about the models’ overall behavior or the behavior given a specific input. This allows users to decide whether or not a model’s output or prediction is trustworthy.\nSuch analysis can help developers debug models and improve performance by pointing out biases, spurious correlations, and failure modes of models. In cases where models can surpass human performance on a task, interpretability can help users and researchers better understand relationships in their data and previously unknown patterns.\nThere are many classes of explainability/interpretability methods, including post hoc explainability, inherent interpretability, and mechanistic interpretability. These methods aim to make complex machine learning models more understandable and ensure users can trust model predictions, especially in critical settings. By providing transparency into model behavior, explainability techniques are an important tool for developing safe, fair, and reliable AI systems.\n\nPost Hoc Explainability\nPost hoc explainability methods typically explain the output behavior of a black-box model on a specific input. Popular methods include counterfactual explanations, feature attribution methods, and concept-based explanations.\nCounterfactual explanations, also frequently called algorithmic recourse, “If X had not occurred, Y would not have occurred” (Wachter, Mittelstadt, and Russell 2017). For example, consider a person applying for a bank loan whose application is rejected by a model. They may ask their bank for recourse or how to change to be eligible for a loan. A counterfactual explanation would tell them which features they need to change and by how much such that the model’s prediction changes.\n\nWachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” SSRN Electronic Journal 31: 841. https://doi.org/10.2139/ssrn.3063289.\n\nSelvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” In 2017 IEEE International Conference on Computer Vision (ICCV), 618–26. IEEE. https://doi.org/10.1109/iccv.2017.74.\n\nSmilkov, Daniel, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. “Smoothgrad: Removing Noise by Adding Noise.” ArXiv Preprint abs/1706.03825. https://arxiv.org/abs/1706.03825.\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “” Why Should i Trust You?” Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.\n\nLundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, 4765–74. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.\nFeature attribution methods highlight the input features that are important or necessary for a particular prediction. For a computer vision model, this would mean highlighting the individual pixels that contributed most to the predicted label of the image. Note that these methods do not explain how those pixels/features impact the prediction, only that they do. Common methods include input gradients, GradCAM (Selvaraju et al. 2017), SmoothGrad (Smilkov et al. 2017), LIME (Ribeiro, Singh, and Guestrin 2016), and SHAP (Lundberg and Lee 2017).\nBy providing examples of changes to input features that would alter a prediction (counterfactuals) or indicating the most influential features for a given prediction (attribution), these post hoc explanation techniques shed light on model behavior for individual inputs. This granular transparency helps users determine whether they can trust and act upon specific model outputs.\nConcept-based explanations aim to explain model behavior and outputs using a pre-defined set of semantic concepts (e.g., the model recognizes scene class “bedroom” based on the presence of concepts “bed” and “pillow”). Recent work shows that users often prefer these explanations to attribution and example-based explanations because they “resemble human reasoning and explanations” (Vikram V. Ramaswamy et al. 2023b). Popular concept-based explanation methods include TCAV (Cai et al. 2019), Network Dissection (Bau et al. 2017), and interpretable basis decomposition (Zhou et al. 2018).\n\nRamaswamy, Vikram V, Sunnie SY Kim, Ruth Fong, and Olga Russakovsky. 2023b. “UFO: A Unified Method for Controlling Understandability and Faithfulness Objectives in Concept-Based Explanations for CNNs.” ArXiv Preprint abs/2303.15632. https://arxiv.org/abs/2303.15632.\n\nCai, Carrie J., Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, et al. 2019. “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making.” In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, edited by Jennifer G. Dy and Andreas Krause, 80:2673–82. Proceedings of Machine Learning Research. ACM. https://doi.org/10.1145/3290605.3300234.\n\nBau, David, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. “Network Dissection: Quantifying Interpretability of Deep Visual Representations.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3319–27. IEEE. https://doi.org/10.1109/cvpr.2017.354.\n\nZhou, Bolei, Yiyou Sun, David Bau, and Antonio Torralba. 2018. “Interpretable Basis Decomposition for Visual Explanation.” In Proceedings of the European Conference on Computer Vision (ECCV), 119–34.\n\nRamaswamy, Vikram V., Sunnie S. Y. Kim, Ruth Fong, and Olga Russakovsky. 2023a. “Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10932–41. IEEE. https://doi.org/10.1109/cvpr52729.2023.01052.\nNote that these methods are extremely sensitive to the size and quality of the concept set, and there is a tradeoff between their accuracy and faithfulness and their interpretability or understandability to humans (Vikram V. Ramaswamy et al. 2023a). However, by mapping model predictions to human-understandable concepts, concept-based explanations can provide transparency into the reasoning behind model outputs.\n\n\nInherent Interpretability\nInherently interpretable models are constructed such that their explanations are part of the model architecture and are thus naturally faithful, which sometimes makes them preferable to post-hoc explanations applied to black-box models, especially in high-stakes domains where transparency is imperative (Rudin 2019). Often, these models are constrained so that the relationships between input features and predictions are easy for humans to follow (linear models, decision trees, decision sets, k-NN models), or they obey structural knowledge of the domain, such as monotonicity (Gupta et al. 2016), causality, or additivity (Lou et al. 2013; Beck and Jackman 1998).\n\nRudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.\n\nGupta, Maya, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van Esbroeck. 2016. “Monotonic Calibrated Interpolated Look-up Tables.” The Journal of Machine Learning Research 17 (1): 3790–3836.\n\nLou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. “Accurate Intelligible Models with Pairwise Interactions.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, edited by Inderjit S. Dhillon, Yehuda Koren, Rayid Ghani, Ted E. Senator, Paul Bradley, Rajesh Parekh, Jingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, 623–31. ACM. https://doi.org/10.1145/2487575.2487579.\n\nBeck, Nathaniel, and Simon Jackman. 1998. “Beyond Linearity by Default: Generalized Additive Models.” Am. J. Polit. Sci. 42 (2): 596. https://doi.org/10.2307/2991772.\n\nKoh, Pang Wei, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. 2020. “Concept Bottleneck Models.” In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, 119:5338–48. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v119/koh20a.html.\n\nChen, Chaofan, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan Su. 2019. “This Looks Like That: Deep Learning for Interpretable Image Recognition.” In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, edited by Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, 8928–39. https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html.\nHowever, more recent works have relaxed the restrictions on inherently interpretable models, using black-box models for feature extraction and a simpler inherently interpretable model for classification, allowing for faithful explanations that relate high-level features to prediction. For example, Concept Bottleneck Models (Koh et al. 2020) predict a concept set c that is passed into a linear classifier. ProtoPNets (Chen et al. 2019) dissect inputs into linear combinations of similarities to prototypical parts from the training set.\n\n\nMechanistic Interpretability\nMechanistic interpretability methods seek to reverse engineer neural networks, often analogizing them to how one might reverse engineer a compiled binary or how neuroscientists attempt to decode the function of individual neurons and circuits in brains. Most research in mechanistic interpretability views models as a computational graph (Geiger et al. 2021), and circuits are subgraphs with distinct functionality (Wang and Zhan 2019). Current approaches to extracting circuits from neural networks and understanding their functionality rely on human manual inspection of visualizations produced by circuits (Olah et al. 2020).\n\nGeiger, Atticus, Hanson Lu, Thomas Icard, and Christopher Potts. 2021. “Causal Abstractions of Neural Networks.” In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, 9574–86. https://proceedings.neurips.cc/paper/2021/hash/4f5c422f4d49a5a807eda27434231040-Abstract.html.\n\nWang, LingFeng, and YaQing Zhan. 2019. “A Conceptual Peer Review Model for arXiv and Other Preprint Databases.” Learn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\nOlah, Chris, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. 2020. “Zoom in: An Introduction to Circuits.” Distill 5 (3): e00024–001. https://doi.org/10.23915/distill.00024.001.\n\nDavarzani, Samaneh, David Saucier, Purva Talegaonkar, Erin Parker, Alana Turner, Carver Middleton, Will Carroll, et al. 2023. “Closing the Wearable Gap: Footankle Kinematic Modeling via Deep Learning Models Based on a Smart Sock Wearable.” Wearable Technologies 4. https://doi.org/10.1017/wtc.2023.3.\nAlternatively, some approaches build sparse autoencoders that encourage neurons to encode disentangled interpretable features (Davarzani et al. 2023). This field is much newer than existing areas in explainability and interpretability, and as such, most works are generally exploratory rather than solution-oriented.\nThere are many problems in mechanistic interpretability, including the polysemanticity of neurons and circuits, the inconvenience and subjectivity of human labeling, and the exponential search space for identifying circuits in large models with billions or trillions of neurons.\n\n\nChallenges and Considerations\nAs methods for interpreting and explaining models progress, it is important to note that humans overtrust and misuse interpretability tools (Kaur et al. 2020) and that a user’s trust in a model due to an explanation can be independent of the correctness of the explanations (Lakkaraju and Bastani 2020). As such, it is necessary that aside from assessing the faithfulness/correctness of explanations, researchers must also ensure that interpretability methods are developed and deployed with a specific user in mind and that user studies are performed to evaluate their efficacy and usefulness in practice.\n\nKaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. “Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, edited by Regina Bernhaupt, Florian ’Floyd’Mueller, David Verweij, Josh Andres, Joanna McGrenere, Andy Cockburn, Ignacio Avellino, et al., 1–14. ACM. https://doi.org/10.1145/3313831.3376219.\n\nLakkaraju, Himabindu, and Osbert Bastani. 2020. “”How Do i Fool You?”: Manipulating User Trust via Misleading Black Box Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79–85. ACM. https://doi.org/10.1145/3375627.3375833.\nFurthermore, explanations should be tailored to the user’s expertise, the task they are using the explanation for and the corresponding minimal amount of information required for the explanation to be useful to prevent information overload.\nWhile interpretability/explainability are popular areas in machine learning research, very few works study their intersection with TinyML and edge computing. Given that a significant application of TinyML is healthcare, which often requires high transparency and interpretability, existing techniques must be tested for scalability and efficiency concerning edge devices. Many methods rely on extra forward and backward passes, and some even require extensive training in proxy models, which are infeasible on resource-constrained microcontrollers.\nThat said, explainability methods can be highly useful in developing models for edge devices, as they can give insights into how input data and models can be compressed and how representations may change post-compression. Furthermore, many interpretable models are often smaller than their black-box counterparts, which could benefit TinyML applications.\n\n\n\n15.5.6 Monitoring Model Performance\nWhile developers may train models that seem adversarially robust, fair, and interpretable before deployment, it is imperative that both the users and the model owners continue to monitor the model’s performance and trustworthiness during the model’s full lifecycle. Data is frequently changing in practice, which can often result in distribution shifts. These distribution shifts can profoundly impact the model’s vanilla predictive performance and its trustworthiness (fairness, robustness, and interpretability) in real-world data.\nFurthermore, definitions of fairness frequently change with time, such as what society considers a protected attribute, and the expertise of the users asking for explanations may also change.\nTo ensure that models keep up to date with such changes in the real world, developers must continually evaluate their models on current and representative data and standards and update models when necessary.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1661,7 +1661,7 @@ "section": "15.6 Implementation Challenges", "text": "15.6 Implementation Challenges\n\n15.6.1 Organizational and Cultural Structures\nWhile innovation and regulation are often seen as having competing interests, many countries have found it necessary to provide oversight as AI systems expand into more sectors. As illustrated in Figure 15.4, this oversight has become crucial as these systems continue permeating various industries and impacting people’s lives (see Human-Centered AI, Chapter 8 “Government Interventions and Regulations”.\n\n\n\n\n\n\nFigure 15.4: How various groups impact human-centered AI. Credit: Shneiderman (2020).\n\n\nShneiderman, Ben. 2020. “Bridging the Gap Between Ethics and Practice: Guidelines for Reliable, Safe, and Trustworthy Human-Centered AI Systems.” ACM Trans. Interact. Intell. Syst. 10 (4): 1–31. https://doi.org/10.1145/3419764.\n\n\nAmong these are:\n\nCanada’s Responsible Use of Artificial Intelligence\nThe European Union’s General Data Protection Regulation (GDPR)\nThe European Commission’s White Paper on Artificial Intelligence: a European approach to excellence and trust\nThe UK’s Information Commissioner’s Office and Alan Turing Institute’s Consultation on Explaining AI Decisions Guidance co-badged guidance by the individuals affected by them.\n\n\n\n15.6.2 Obtaining Quality and Representative Data\nAs discussed in the Data Engineering chapter, responsible AI design must occur at all pipeline stages, including data collection. This begs the question: what does it mean for data to be high-quality and representative? Consider the following scenarios that hinder the representativeness of data:\n\nSubgroup Imbalance\nThis is likely what comes to mind when hearing “representative data.” Subgroup imbalance means the dataset contains relatively more data from one subgroup than another. This imbalance can negatively affect the downstream ML model by causing it to overfit a subgroup of people while performing poorly on another.\nOne example consequence of subgroup imbalance is racial discrimination in facial recognition technology (Buolamwini and Gebru 2018); commercial facial recognition algorithms have up to 34% worse error rates on darker-skinned females than lighter-skinned males.\n\nBuolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Conference on Fairness, Accountability and Transparency, 77–91. PMLR.\nNote that data imbalance goes both ways, and subgroups can also be harmful overrepresented in the dataset. For example, the Allegheny Family Screening Tool (AFST) predicts the likelihood that a child will eventually be removed from a home. The AFST produces disproportionate scores for different subgroups, one of the reasons being that it is trained on historically biased data, sourced from juvenile and adult criminal legal systems, public welfare agencies, and behavioral health agencies and programs.\n\n\nQuantifying Target Outcomes\nThis occurs in applications where the ground-truth label cannot be measured or is difficult to represent in a single quantity. For example, an ML model in a mobile wellness application may want to predict individual stress levels. The true stress labels themselves are impossible to obtain directly and must be inferred from other biosignals, such as heart rate variability and user self-reported data. In these situations, noise is built into the data by design, making this a challenging ML task.\n\n\nDistribution Shift\nData may no longer represent a task if a major external event causes the data source to change drastically. The most common way to think about distribution shifts is with respect to time; for example, data on consumer shopping habits collected pre-covid may no longer be present in consumer behavior today.\nThe transfer causes another form of distribution shift. For instance, when applying a triage system that was trained on data from one hospital to another, a distribution shift may occur if the two hospitals are very different.#\n\n\nGathering Data\nA reasonable solution for many of the above problems with non-representative or low-quality data is to collect more; we can collect more data targeting an underrepresented subgroup or from the target hospital to which our model might be transferred. However, for some reasons, gathering more data is an inappropriate or infeasible solution for the task at hand.\n\nData collection can be harmful. This is the paradox of exposure, the situation in which those who stand to significantly gain from their data being collected are also those who are put at risk by the collection process (D’ignazio and Klein (2023), Chapter 4). For example, collecting more data on non-binary individuals may be important for ensuring the fairness of the ML application, but it also puts them at risk, depending on who is collecting the data and how (whether the data is easily identifiable, contains sensitive content, etc.).\nData collection can be costly. In some domains, such as healthcare, obtaining data can be costly in terms of time and money.\nBiased data collection. Electronic Health Records is a huge data source for ML-driven healthcare applications. Issues of subgroup representation aside, the data itself may be collected in a biased manner. For example, negative language (“nonadherent,” “unwilling”) is disproportionately used on black patients (Himmelstein, Bates, and Zhou 2022).\n\n\nD’ignazio, Catherine, and Lauren F Klein. 2023. Data Feminism. MIT press.\n\nHimmelstein, Gracie, David Bates, and Li Zhou. 2022. “Examination of Stigmatizing Language in the Electronic Health Record.” JAMA Network Open 5 (1): e2144967. https://doi.org/10.1001/jamanetworkopen.2021.44967.\nWe conclude with several additional strategies for maintaining data quality: improving understanding of the data, data exploration, and intr. First, fostering a deeper understanding of the data is crucial. This can be achieved through the implementation of standardized labels and measures of data quality, such as in the Data Nutrition Project.\nCollaborating with organizations responsible for collecting data helps ensure the data is interpreted correctly. Second, employing effective tools for data exploration is important. Visualization techniques and statistical analyses can reveal issues with the data. Finally, establishing a feedback loop within the ML pipeline is essential for understanding the real-world implications of the data. Metrics, such as fairness measures, allow us to define “data quality” in the context of the downstream application; improving fairness may directly improve the quality of the predictions that the end users receive.\n\n\n\n15.6.3 Balancing Accuracy and Other Objectives\nMachine learning models are often evaluated on accuracy alone, but this single metric cannot fully capture model performance and tradeoffs for responsible AI systems. Other ethical dimensions, such as fairness, robustness, interpretability, and privacy, may compete with pure predictive accuracy during model development. For instance, inherently interpretable models such as small decision trees or linear classifiers with simplified features intentionally trade some accuracy for transparency in the model behavior and predictions. While these simplified models achieve lower accuracy by not capturing all the complexity in the dataset, improved interpretability builds trust by enabling direct analysis by human practitioners.\nAdditionally, certain techniques meant to improve adversarial robustness, such as adversarial training examples or dimensionality reduction, can degrade the accuracy of clean validation data. In sensitive applications like healthcare, focusing narrowly on state-of-the-art accuracy carries ethical risks if it allows models to rely more on spurious correlations that introduce bias or use opaque reasoning. Therefore, the appropriate performance objectives depend greatly on the sociotechnical context.\nMethodologies like Value Sensitive Design provide frameworks for formally evaluating the priorities of various stakeholders within the real-world deployment system. These elucidate tensions between values like accuracy, interpretation, ility, and fail and redness, which can then guide responsible tradeoff decisions. For a medical diagnosis system, achieving the highest accuracy may not be the singular goal - improving transparency to build practitioner trust or reducing bias towards minority groups could justify small losses in accuracy. Analyzing the sociotechnical context is key for setting these objectives.\nBy taking a holistic view, we can responsibly balance accuracy with other ethical objectives for model success. Ongoing performance monitoring along multiple dimensions is crucial as the system evolves after deployment.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1672,7 +1672,7 @@ "section": "15.7 Ethical Considerations in AI Design", "text": "15.7 Ethical Considerations in AI Design\nWe must discuss at least some of the many ethical issues at stake in designing and applying AI systems and diverse frameworks for approaching these issues, including those from AI safety, Human-Computer Interaction (HCI), and Science, Technology, and Society (STS).\n\n15.7.1 AI Safety and Value Alignment\nIn 1960, Norbert Weiner wrote, “’if we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively… we had better be quite sure that the purpose put into the machine is the purpose which we desire” (Wiener 1960).\n\nWiener, Norbert. 1960. “Some Moral and Technical Consequences of Automation: As Machines Learn They May Develop Unforeseen Strategies at Rates That Baffle Their Programmers.” Science 131 (3410): 1355–58. https://doi.org/10.1126/science.131.3410.1355.\n\nRussell, Stuart. 2021. “Human-Compatible Artificial Intelligence.” Human-Like Machine Intelligence, 3–23.\nIn recent years, as the capabilities of deep learning models have achieved, and sometimes even surpassed, human abilities, the issue of creating AI systems that act in accord with human intentions instead of pursuing unintended or undesirable goals has become a source of concern (Russell 2021). Within the field of AI safety, a particular goal concerns “value alignment,” or the problem of how to code the “right” purpose into machines Human-Compatible Artificial Intelligence. Present AI research assumes we know the objectives we want to achieve and “studies the ability to achieve objectives, not the design of those objectives.”\nHowever, complex real-world deployment contexts make explicitly defining “the right purpose” for machines difficult, requiring frameworks for responsible and ethical goal-setting. Methodologies like Value Sensitive Design provide formal mechanisms to surface tensions between stakeholder values and priorities.\nBy taking a holistic sociotechnical view, we can better ensure intelligent systems pursue objectives that align with broad human intentions rather than maximizing narrow metrics like accuracy alone. Achieving this in practice remains an open and critical research question as AI capabilities advance rapidly.\nThe absence of this alignment can lead to several AI safety issues, as have been documented in a variety of deep learning models. A common feature of systems that optimize for an objective is that variables not directly included in the objective may be set to extreme values to help optimize for that objective, leading to issues characterized as specification gaming, reward hacking, etc., in reinforcement learning (RL).\nIn recent years, a particularly popular implementation of RL has been models pre-trained using self-supervised learning and fine-tuned reinforcement learning from human feedback (RLHF) (Christiano et al. 2017). Ngo 2022 (Ngo, Chan, and Mindermann 2022) argues that by rewarding models for appearing harmless and ethical while also maximizing useful outcomes, RLHF could encourage the emergence of three problematic properties: situationally aware reward hacking, where policies exploit human fallibility to gain high reward, misaligned internally-represented goals that generalize beyond the RLHF fine-tuning distribution, and power-seeking strategies.\n\nChristiano, Paul F., Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. “Deep Reinforcement Learning from Human Preferences.” In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, 4299–4307. https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html.\n\nNgo, Richard, Lawrence Chan, and Sören Mindermann. 2022. “The Alignment Problem from a Deep Learning Perspective.” ArXiv Preprint abs/2209.00626. https://arxiv.org/abs/2209.00626.\n\nVan Noorden, Richard. 2016. “ArXiv Preprint Server Plans Multimillion-Dollar Overhaul.” Nature 534 (7609): 602–2. https://doi.org/10.1038/534602a.\nSimilarly, Van Noorden (2016) outlines six concrete problems for AI safety, including avoiding negative side effects, avoiding reward hacking, scalable oversight for aspects of the objective that are too expensive to be frequently evaluated during training, safe exploration strategies that encourage creativity while preventing harm, and robustness to distributional shift in unseen testing environments.\n\n\n15.7.2 Autonomous Systems and Control [and Trust]\nThe consequences of autonomous systems that act independently of human oversight and often outside human judgment have been well documented across several industries and use cases. Most recently, the California Department of Motor Vehicles suspended Cruise’s deployment and testing permits for its autonomous vehicles citing “unreasonable risks to public safety”. One such accident occurred when a vehicle struck a pedestrian who stepped into a crosswalk after the stoplight had turned green, and the vehicle was allowed to proceed. In 2018, a pedestrian crossing the street with her bike was killed when a self-driving Uber car, which was operating in autonomous mode, failed to accurately classify her moving body as an object to be avoided.\nAutonomous systems beyond self-driving vehicles are also susceptible to such issues, with potentially graver consequences, as remotely-powered drones are already reshaping warfare. While such incidents bring up important ethical questions regarding who should be held responsible when these systems fail, they also highlight the technical challenges of giving full control of complex, real-world tasks to machines.\nAt its core, there is a tension between human and machine autonomy. Engineering and computer science disciplines have tended to focus on machine autonomy. For example, as of 2019, a search for the word “autonomy” in the Digital Library of the Association for Computing Machinery (ACM) reveals that of the top 100 most cited papers, 90% are on machine autonomy (Calvo et al. 2020). In an attempt to build systems for the benefit of humanity, these disciplines have taken, without question, increasing productivity, efficiency, and automation as primary strategies for benefiting humanity.\n\nMcCarthy, John. 1981. “Epistemological Problems of Artificial Intelligence.” In Readings in Artificial Intelligence, 459–65. Elsevier. https://doi.org/10.1016/b978-0-934613-03-3.50035-0.\nThese goals put machine automation at the forefront, often at the expense of the human. This approach suffers from inherent challenges, as noted since the early days of AI through the Frame problem and qualification problem, which formalizes the observation that it is impossible to specify all the preconditions needed for a real-world action to succeed (McCarthy 1981).\nThese logical limitations have given rise to mathematical approaches such as Responsibility-sensitive safety (RSS) (Shalev-Shwartz, Shammah, and Shashua 2017), which is aimed at breaking down the end goal of an automated driving system (namely safety) into concrete and checkable conditions that can be rigorously formulated in mathematical terms. The goal of RSS is that those safety rules guarantee ADS safety in the rigorous form of mathematical proof. However, such approaches tend towards using automation to address the problems of automation and are susceptible to many of the same issues.\n\nShalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. 2017. “On a Formal Model of Safe and Scalable Self-Driving Cars.” ArXiv Preprint abs/1708.06374. https://arxiv.org/abs/1708.06374.\n\nFriedman, Batya. 1996. “Value-Sensitive Design.” Interactions 3 (6): 16–23. https://doi.org/10.1145/242485.242493.\n\nPeters, Dorian, Rafael A. Calvo, and Richard M. Ryan. 2018. “Designing for Motivation, Engagement and Wellbeing in Digital Experience.” Front. Psychol. 9 (May): 797. https://doi.org/10.3389/fpsyg.2018.00797.\n\nRyan, Richard M., and Edward L. Deci. 2000. “Self-Determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-Being.” Am. Psychol. 55 (1): 68–78. https://doi.org/10.1037/0003-066x.55.1.68.\nAnother approach to combating these issues is to focus on the human-centered design of interactive systems that incorporate human control. Value-sensitive design (Friedman 1996) described three key design factors for a user interface that impact autonomy, including system capability, complexity, misrepresentation, and fluidity. A more recent model, called METUX (A Model for Motivation, Engagement, and Thriving in the User Experience), leverages insights from Self-determination Theory (SDT) in Psychology to identify six distinct spheres of technology experience that contribute to the design systems that promote well-being and human flourishing (Peters, Calvo, and Ryan 2018). SDT defines autonomy as acting by one’s goals and values, which is distinct from the use of autonomy as simply a synonym for either independence or being in control (Ryan and Deci 2000).\nCalvo 2020 elaborates on METUX and its six “spheres of technology experience” in the context of AI-recommender systems (Calvo et al. 2020). They propose these spheres—adoption, Interface, Tasks, Behavior, Life, and Society—as a way of organizing thinking and evaluation of technology design in order to appropriately capture contradictory and downstream impacts on human autonomy when interacting with AI systems.\n\nCalvo, Rafael A, Dorian Peters, Karina Vold, and Richard M Ryan. 2020. “Supporting Human Autonomy in AI Systems: A Framework for Ethical Enquiry.” Ethics of Digital Well-Being: A Multidisciplinary Approach, 31–54.\n\n\n15.7.3 Economic Impacts on Jobs, Skills, Wages\nA major concern of the current rise of AI technologies is widespread unemployment. As AI systems’ capabilities expand, many fear these technologies will cause an absolute loss of jobs as they replace current workers and overtake alternative employment roles across industries. However, changing economic landscapes at the hands of automation is not new, and historically, have been found to reflect patterns of displacement rather than replacement (Shneiderman 2022)—Chapter 4. In particular, automation usually lowers costs and increases quality, greatly increasing access and demand. The need to serve these growing markets pushes production, creating new jobs.\n\n———. 2022. Human-Centered AI. Oxford University Press.\nFurthermore, studies have found that attempts to achieve “lights-out” automation – productive and flexible automation with a minimal number of human workers – have been unsuccessful. Attempts to do so have led to what the MIT Work of the Future taskforce has termed “zero-sum automation”, in which process flexibility is sacrificed for increased productivity.\nIn contrast, the task force proposes a “positive-sum automation” approach in which flexibility is increased by designing technology that strategically incorporates humans where they are very much needed, making it easier for line employees to train and debug robots, using a bottom-up approach to identifying what tasks should be automated; and choosing the right metrics for measuring success (see MIT’s Work of the Future).\nHowever, the optimism of the high-level outlook does not preclude individual harm, especially to those whose skills and jobs will be rendered obsolete by automation. Public and legislative pressure, as well as corporate social responsibility efforts, will need to be directed at creating policies that share the benefits of automation with workers and result in higher minimum wages and benefits.\n\n\n15.7.4 Scientific Communication and AI Literacy\nA 1993 survey of 3000 North American adults’ beliefs about the “electronic thinking machine” revealed two primary perspectives of the early computer: the “beneficial tool of man” perspective and the “awesome thinking machine” perspective. The attitudes contributing to the “awesome thinking machine” view in this and other studies revealed a characterization of computers as “intelligent brains, smarter than people, unlimited, fast, mysterious, and frightening” (Martin 1993). These fears highlight an easily overlooked component of responsible AI, especially amidst the rush to commercialize such technologies: scientific communication that accurately communicates the capabilities and limitations of these systems while providing transparency about the limitations of experts’ knowledge about these systems.\n\nMartin, C. Dianne. 1993. “The Myth of the Awesome Thinking Machine.” Commun. ACM 36 (4): 120–33. https://doi.org/10.1145/255950.153587.\n\nHandlin, Oscar. 1965. “Science and Technology in Popular Culture.” Daedalus-Us., 156–70.\nAs AI systems’ capabilities expand beyond most people’s comprehension, there is a natural tendency to assume the kinds of apocalyptic worlds painted by our media. This is partly due to the apparent difficulty of assimilating scientific information, even in technologically advanced cultures, which leads to the products of science being perceived as magic—“understandable only in terms of what it did, not how it worked” (Handlin 1965).\nWhile tech companies should be held responsible for limiting grandiose claims and not falling into cycles of hype, research studying scientific communication, especially concerning (generative) AI, will also be useful in tracking and correcting public understanding of these technologies. An analysis of the Scopus scholarly database found that such research is scarce, with only a handful of papers mentioning both “science communication” and “artificial intelligence” (Schäfer 2023).\n\nSchäfer, Mike S. 2023. “The Notorious GPT: Science Communication in the Age of Artificial Intelligence.” Journal of Science Communication 22 (02): Y02. https://doi.org/10.22323/2.22020402.\n\nLindgren, Simon. 2023. Handbook of Critical Studies of Artificial Intelligence. Edward Elgar Publishing.\n\nNg, Davy Tsz Kit, Jac Ka Lok Leung, Kai Wah Samuel Chu, and Maggie Shen Qiao. 2021. “AI Literacy: Definition, Teaching, Evaluation and Ethical Issues.” Proceedings of the Association for Information Science and Technology 58 (1): 504–9.\nResearch that exposes the perspectives, frames, and images of the future promoted by academic institutions, tech companies, stakeholders, regulators, journalists, NGOs, and others will also help to identify potential gaps in AI literacy among adults (Lindgren 2023). Increased focus on AI literacy from all stakeholders will be important in helping people whose skills are rendered obsolete by AI automation (Ng et al. 2021).\n“But even those who never acquire that understanding need assurance that there is a connection between the goals of science and their welfare, and above all, that the scientist is not a man altogether apart but one who shares some of their value.” (Handlin, 1965)", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1683,7 +1683,7 @@ "section": "15.8 Conclusion", "text": "15.8 Conclusion\nResponsible artificial intelligence is crucial as machine learning systems exert growing influence across healthcare, employment, finance, and criminal justice sectors. While AI promises immense benefits, thoughtlessly designed models risk perpetrating harm through biases, privacy violations, unintended behaviors, and other pitfalls.\nUpholding principles of fairness, explainability, accountability, safety, and transparency enables the development of ethical AI aligned with human values. However, implementing these principles involves surmounting complex technical and social challenges around detecting dataset biases, choosing appropriate model tradeoffs, securing quality training data, and more. Frameworks like value-sensitive design guide balancing accuracy versus other objectives based on stakeholder needs.\nLooking forward, advancing responsible AI necessitates continued research and industry commitment. More standardized benchmarks are required to compare model biases and robustness. As personalized TinyML expands, enabling efficient transparency and user control for edge devices warrants focus. Revised incentive structures and policies must encourage deliberate, ethical development before reckless deployment. Education around AI literacy and its limitations will further contribute to public understanding.\nResponsible methods underscore that while machine learning offers immense potential, thoughtless application risks adverse consequences. Cross-disciplinary collaboration and human-centered design are imperative so AI can promote broad social benefit. The path ahead lies not in an arbitrary checklist but in a steadfast commitment to understand and uphold our ethical responsibility at each step. By taking conscientious action, the machine learning community can lead AI toward empowering all people equitably and safely.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1692,9 +1692,9 @@ "href": "contents/responsible_ai/responsible_ai.html#sec-responsible-ai-resource", "title": "15  Responsible AI", "section": "15.9 Resources", - "text": "15.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nWhat am I building? What is the goal?\nWho is the audience?\nWhat are the consequences?\nResponsible Data Collection.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "15.9 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nWhat am I building? What is the goal?\nWho is the audience?\nWhat are the consequences?\nResponsible Data Collection.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 15.1\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Advanced Topics", "15  Responsible AI" ] }, @@ -1705,7 +1705,7 @@ "section": "", "text": "16.1 Introduction\nThe rapid advancements in artificial intelligence (AI) and machine learning (ML) have led to many beneficial applications and optimizations for performance efficiency. However, the remarkable growth of AI comes with a significant yet often overlooked cost: its environmental impact. The most recent report released by the IPCC, the international body leading scientific assessments of climate change and its impacts, emphasized the pressing importance of tackling climate change. Without immediate efforts to decrease global \\(\\textrm{CO}_2\\) emissions by at least 43 percent before 2030, we exceed global warming of 1.5 degrees Celsius (Winkler et al. 2022). This could initiate positive feedback loops, pushing temperatures even higher. Next to environmental issues, the United Nations recognized 17 Sustainable Development Goals (SDGs), in which AI can play an important role, and vice versa, play an important role in the development of AI systems. As the field continues expanding, considering sustainability is crucial.\nAI systems, particularly large language models like GPT-3 and computer vision models like DALL-E 2, require massive amounts of computational resources for training. For example, GPT-3 was estimated to consume 1,300 megawatt-hours of electricity, which is equal to 1,450 average US households in an entire month (Maslej et al. 2023), or put another way, it consumed enough energy to supply an average US household for 120 years! This immense energy demand stems primarily from power-hungry data centers with servers running intense computations to train these complex neural networks for days or weeks.\nCurrent estimates indicate that the carbon emissions produced from developing a single, sophisticated AI model can equal the emissions over the lifetime of five standard gasoline-powered vehicles (Strubell, Ganesh, and McCallum 2019). A significant portion of the electricity presently consumed by data centers is generated from nonrenewable sources such as coal and natural gas, resulting in data centers contributing around 1% of total worldwide carbon emissions. This is comparable to the emissions from the entire airline sector. This immense carbon footprint demonstrates the pressing need to transition to renewable power sources such as solar and wind to operate AI development.\nAdditionally, even small-scale AI systems deployed to edge devices as part of TinyML have environmental impacts that should not be ignored (Prakash, Stewart, et al. 2023). The specialized hardware required for AI has an environmental toll from natural resource extraction and manufacturing. GPUs, CPUs, and chips like TPUs depend on rare earth metals whose mining and processing generate substantial pollution. The production of these components also has its energy demands. Furthermore, collecting, storing, and preprocessing data used to train both small- and large-scale models comes with environmental costs, further exacerbating the sustainability implications of ML systems.\nThus, while AI promises innovative breakthroughs in many fields, sustaining progress requires addressing sustainability challenges. AI can continue advancing responsibly by optimizing models’ efficiency, exploring alternative specialized hardware and renewable energy sources for data centers, and tracking its overall environmental impact.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1716,7 +1716,7 @@ "section": "", "text": "Winkler, Harald, Franck Lecocq, Hans Lofgren, Maria Virginia Vilariño, Sivan Kartha, and Joana Portugal-Pereira. 2022. “Examples of Shifting Development Pathways: Lessons on How to Enable Broader, Deeper, and Faster Climate Action.” Climate Action 1 (1). https://doi.org/10.1007/s44168-022-00026-1.\n\nMaslej, Nestor, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, et al. 2023. “Artificial Intelligence Index Report 2023.” ArXiv Preprint abs/2310.03715. https://arxiv.org/abs/2310.03715.\n\n\n\nPrakash, Shvetank, Matthew Stewart, Colby Banbury, Mark Mazumder, Pete Warden, Brian Plancher, and Vijay Janapa Reddi. 2023. “Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers.” ArXiv Preprint. https://arxiv.org/abs/2301.11899.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1727,7 +1727,7 @@ "section": "16.2 Social and Ethical Responsibility", "text": "16.2 Social and Ethical Responsibility\nThe environmental impact of AI is not just a technical issue but also an ethical and social one. As AI becomes more integrated into our lives and industries, its sustainability becomes increasingly critical.\n\n16.2.1 Ethical Considerations\nThe scale of AI’s environmental footprint raises profound ethical questions about the responsibilities of AI developers and companies to minimize their carbon emissions and energy usage. As the creators of AI systems and technologies that can have sweeping global impacts, developers have an ethical obligation to consciously integrate environmental stewardship into their design process, even if sustainability comes at the cost of some efficiency gains.\nThere is a clear and present need for us to have open and honest conversations about AI’s environmental tradeoffs earlier in the development lifecycle. Researchers should feel empowered to voice concerns if organizational priorities do not align with ethical goals, as in the case of the open letter to pause giant AI experiments.\nAdditionally, there is an increasing need for AI companies to scrutinize their contributions to climate change and environmental harm. Large tech firms are responsible for the cloud infrastructure, data center energy demands, and resource extraction required to power today’s AI. Leadership should assess whether organizational values and policies promote sustainability, from hardware manufacturing through model training pipelines.\nFurthermore, more than voluntary self-regulation may be needed– -governments may need to introduce new regulations aimed at sustainable AI standards and practices if we hope to curb the projected energy explosion of ever-larger models. Reported metrics like computing usage, carbon footprint, and efficiency benchmarks could hold organizations accountable.\nThrough ethical principles, company policies, and public rules, AI technologists and corporations have a profound duty to our planet to ensure the responsible and sustainable advancement of technology positioned to transform modern society radically. We owe it to future generations to get this right.\n\n\n16.2.2 Long-term Sustainability\nThe massive projected expansion of AI raises urgent concerns about its long-term sustainability. As AI software and applications rapidly increase in complexity and usage across industries, demand for computing power and infrastructure will skyrocket exponentially in the coming years.\nTo put the scale of projected growth in perspective, the total computing capacity required for training AI models saw an astonishing 350,000x increase from 2012 to 2019 (R. Schwartz et al. 2020). Researchers forecast over an order of magnitude growth each year moving forward as personalized AI assistants, autonomous technology, precision medicine tools, and more are developed. Similar trends are estimated for embedded ML systems, with an estimated 2.5 billion AI-enabled edge devices deployed by 2030.\nManaging this expansion level requires software and hardware-focused breakthroughs in efficiency and renewable integration from AI engineers and scientists. On the software side, novel techniques in model optimization, distillation, pruning, low-precision numerics, knowledge sharing between systems, and other areas must become widespread best practices to curb energy needs. For example, realizing even a 50% reduced computational demand per capability doubling would have massive compounding on total energy.\nOn the hardware infrastructure side, due to increasing costs of data transfer, storage, cooling, and space, continuing today’s centralized server farm model at data centers is likely infeasible long-term (Lannelongue, Grealey, and Inouye 2021). Exploring alternative decentralized computing options around “edge AI” on local devices or within telco networks can alleviate scaling pressures on power-hungry hyper scale data centers. Likewise, the shift towards carbon-neutral, hybrid renewable energy sources powering leading cloud provider data centers worldwide will be essential.\n\nLannelongue, Loı̈c, Jason Grealey, and Michael Inouye. 2021. “Green Algorithms: Quantifying the Carbon Footprint of Computation.” Adv. Sci. 8 (12): 2100707. https://doi.org/10.1002/advs.202100707.\n\n\n16.2.3 AI for Environmental Good\nWhile much focus goes on AI’s sustainability challenges, these powerful technologies provide unique solutions to combat climate change and drive environmental progress. For example, ML can continuously optimize smart power grids to improve renewable integration and electricity distribution efficiency across networks (Zhang, Han, and Deng 2018). Models can ingest the real-time status of a power grid and weather forecasts to allocate and shift sources responding to supply and demand.\n\nZhang, Dongxia, Xiaoqing Han, and Chunyu Deng. 2018. “Review on the Research and Practice of Deep Learning and Reinforcement Learning in Smart Grids.” CSEE Journal of Power and Energy Systems 4 (3): 362–70. https://doi.org/10.17775/cseejpes.2018.00520.\n\nLam, Remi, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, et al. 2023. “Learning Skillful Medium-Range Global Weather Forecasting.” Science 382 (6677): 1416–21. https://doi.org/10.1126/science.adi2336.\n\nKurth, Thorsten, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima Anandkumar. 2023. “FourCastNet: Accelerating Global High-Resolution Weather Forecasting Using Adaptive Fourier Neural Operators.” In Proceedings of the Platform for Advanced Scientific Computing Conference, 1–11. ACM. https://doi.org/10.1145/3592979.3593412.\nFine-tuned neural networks have also proven remarkably effective at next-generation weather forecasting (Lam et al. 2023) and climate modeling (Kurth et al. 2023). They can rapidly analyze massive volumes of climate data to boost extreme event preparation and resource planning for hurricanes, floods, droughts, and more. Climate researchers have achieved state-of-the-art storm path accuracy by combining AI simulations with traditional numerical models.\nAI also enables better tracking of biodiversity (Silvestro et al. 2022), wildlife (D. Schwartz et al. 2021), ecosystems, and illegal deforestation using drones and satellite feeds. Computer vision algorithms can automate species population estimates and habitat health assessments over huge untracked regions. These capabilities provide conservationists with powerful tools for combating poaching (Bondi et al. 2018), reducing species extinction risks, and understanding ecological shifts.\n\nSilvestro, Daniele, Stefano Goria, Thomas Sterner, and Alexandre Antonelli. 2022. “Improving Biodiversity Protection Through Artificial Intelligence.” Nature Sustainability 5 (5): 415–24. https://doi.org/10.1038/s41893-022-00851-6.\n\nSchwartz, Daniel, Jonathan Michael Gomes Selman, Peter Wrege, and Andreas Paepcke. 2021. “Deployment of Embedded Edge-AI for Wildlife Monitoring in Remote Regions.” In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 1035–42. IEEE; IEEE. https://doi.org/10.1109/icmla52953.2021.00170.\n\nBondi, Elizabeth, Ashish Kapoor, Debadeepta Dey, James Piavis, Shital Shah, Robert Hannaford, Arvind Iyer, Lucas Joppa, and Milind Tambe. 2018. “Near Real-Time Detection of Poachers from Drones in AirSim.” In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, edited by Jérôme Lang, 5814–16. International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2018/847.\nTargeted investment in AI applications for environmental sustainability, cross-sector data sharing, and model accessibility can profoundly accelerate solutions to pressing ecological issues. Emphasizing AI for social good steers innovation in cleaner directions, guiding these world-shaping technologies towards ethical and responsible development.\n\n\n16.2.4 Case Study\nGoogle’s data centers are foundational to powering products like Search, Gmail, and YouTube, which are used by billions daily. However, keeping the vast server farms up and running requires substantial energy, particularly for vital cooling systems. Google continuously strives to enhance efficiency across operations. Yet progress was proving difficult through traditional methods alone, considering the complex, custom dynamics involved. This challenge prompted an ML breakthrough, yielding potential savings.\nAfter over a decade of optimizing data center design, inventing energy-efficient computing hardware, and securing renewable energy sources, Google brought DeepMind scientists to unlock further advances. The AI experts faced intricate factors surrounding the functioning of industrial cooling apparatuses. Equipment like pumps and chillers interact nonlinearly, while external weather and internal architectural variables also change. Capturing this complexity confounded rigid engineering formulas and human intuition.\nThe DeepMind team leveraged Google’s extensive historical sensor data detailing temperatures, power draw, and other attributes as training inputs. They built a flexible system based on neural networks to model the relationships and predict optimal configurations, minimizing power usage effectiveness (PUE) (Barroso, Hölzle, and Ranganathan 2019); PUE is the standard measurement for gauging how efficiently a data center uses energy gives the proportion of total facility power consumed divided by the power directly used for computing operations. When tested live, the AI system delivered remarkable gains beyond prior innovations, lowering cooling energy by 40% for a 15% drop in total PUE, a new site record. The generalizable framework learned cooling dynamics rapidly across shifting conditions that static rules could not match. The breakthrough highlights AI’s rising role in transforming modern tech and enabling a sustainable future.\n\nBarroso, Luiz André, Urs Hölzle, and Parthasarathy Ranganathan. 2019. The Datacenter as a Computer: Designing Warehouse-Scale Machines. Springer International Publishing. https://doi.org/10.1007/978-3-031-01761-2.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1738,7 +1738,7 @@ "section": "16.3 Energy Consumption", "text": "16.3 Energy Consumption\n\n16.3.1 Understanding Energy Needs\nUnderstanding the energy needs for training and operating AI models is crucial in the rapidly evolving field of A.I. With AI entering widespread use in many new fields (Bohr and Memarzadeh 2020; Sudhakar, Sze, and Karaman 2023), the demand for AI-enabled devices and data centers is expected to explode. This understanding helps us understand why AI, particularly deep learning, is often labeled energy-intensive.\n\nBohr, Adam, and Kaveh Memarzadeh. 2020. “The Rise of Artificial Intelligence in Healthcare Applications.” In Artificial Intelligence in Healthcare, 25–60. Elsevier. https://doi.org/10.1016/b978-0-12-818438-7.00002-2.\n\nEnergy Requirements for AI Training\nThe training of complex AI systems like large deep learning models can demand startlingly high levels of computing power–with profound energy implications. Consider OpenAI’s state-of-the-art language model GPT-3 as a prime example. This system pushes the frontiers of text generation through algorithms trained on massive datasets. Yet, the energy GPT-3 consumed for a single training cycle could rival an entire small town’s monthly usage. In recent years, these generative AI models have gained increasing popularity, leading to more models being trained. Next to the increased number of models, the number of parameters in these models will also increase. Research shows that increasing the model size (number of parameters), dataset size, and compute used for training improves performance smoothly with no signs of saturation (Kaplan et al. 2020). See how, in Figure 16.1, the test loss decreases as each of the 3 increases above.\n\n\n\n\n\n\nFigure 16.1: Performance improves with compute, dataset set, and model size. Credit: Kaplan et al. (2020).\n\n\nKaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling Laws for Neural Language Models.” ArXiv Preprint abs/2001.08361. https://arxiv.org/abs/2001.08361.\n\n\nWhat drives such immense requirements? During training, models like GPT-3 learn their capabilities by continuously processing huge volumes of data to adjust internal parameters. The processing capacity enabling AI’s rapid advances also contributes to surging energy usage, especially as datasets and models balloon. GPT-3 highlights a steady trajectory in the field where each leap in AI’s sophistication traces back to ever more substantial computational power and resources. Its predecessor, GPT-2, required 10x less training to compute only 1.5 billion parameters, a difference now dwarfed by magnitudes as GPT-3 comprises 175 billion parameters. Sustaining this trajectory toward increasingly capable AI raises energy and infrastructure provision challenges ahead.\n\n\nOperational Energy Use\nDeveloping and training AI models requires immense data, computing power, and energy. However, the deployment and operation of those models also incur significant recurrent resource costs over time. AI systems are now integrated across various industries and applications and are entering the daily lives of an increasing demographic. Their cumulative operational energy and infrastructure impacts could eclipse the upfront model training.\nThis concept is reflected in the demand for training and inference hardware in data centers and on the edge. Inference refers to using a trained model to make predictions or decisions on real-world data. According to a recent McKinsey analysis, the need for advanced systems to train ever-larger models is rapidly growing. However, inference computations already make up a dominant and increasing portion of total AI workloads, as shown in Figure 16.2. Running real-time inference with trained models–whether for image classification, speech recognition, or predictive analytics–invariably demands computing hardware like servers and chips. However, even a model handling thousands of facial recognition requests or natural language queries daily is dwarfed by massive platforms like Meta. Where inference on millions of photos and videos shared on social media, the infrastructure energy requirements continue to scale!\n\n\n\n\n\n\nFigure 16.2: Market size for inference and training hardware. Credit: McKinsey.\n\n\n\nAlgorithms powering AI-enabled smart assistants, automated warehouses, self-driving vehicles, tailored healthcare, and more have marginal individual energy footprints. However, the projected proliferation of these technologies could add hundreds of millions of endpoints running AI algorithms continually, causing the scale of their collective energy requirements to surge. Current efficiency gains need help to counterbalance this sheer growth.\nAI is expected to see an annual growth rate of 37.3% between 2023 and 2030. Yet, applying the same growth rate to operational computing could multiply annual AI energy needs up to 1,000 times by 2030. So, while model optimization tackles one facet, responsible innovation must also consider total lifecycle costs at global deployment scales that were unfathomable just years ago but now pose infrastructure and sustainability challenges ahead.\n\n\n\n16.3.2 Data Centers and Their Impact\nAs the demand for AI services grows, the impact of data centers on the energy consumption of AI systems is becoming increasingly important. While these facilities are crucial for the advancement and deployment of AI, they contribute significantly to its energy footprint.\n\nScale\nData centers are the essential workhorses enabling the recent computational demands of advanced AI systems. For example, leading providers like Meta operate massive data centers spanning up to the size of multiple football fields, housing hundreds of thousands of high-capacity servers optimized for parallel processing and data throughput.\nThese massive facilities provide the infrastructure for training complex neural networks on vast datasets. For instance, based on leaked information, OpenAI’s language model GPT-4 was trained on Azure data centers packing over 25,000 Nvidia A100 GPUs, used continuously for over 90 to 100 days.\nAdditionally, real-time inference for consumer AI applications at scale is only made possible by leveraging the server farms inside data centers. Services like Alexa, Siri, and Google Assistant process billions of voice requests per month from users globally by relying on data center computing for low-latency response. In the future, expanding cutting-edge use cases like self-driving vehicles, precision medicine diagnostics, and accurate climate forecasting models will require significant computational resources to be obtained by tapping into vast on-demand cloud computing resources from data centers. Some emerging applications, like autonomous cars, have harsh latency and bandwidth constraints. Locating data center-level computing power on the edge rather than the cloud will be necessary.\nMIT research prototypes have shown trucks and cars with onboard hardware performing real-time AI processing of sensor data equivalent to small data centers (Sudhakar, Sze, and Karaman 2023). These innovative “data centers on wheels” demonstrate how vehicles like self-driving trucks may need embedded data center-scale compute on board to achieve millisecond system latency for navigation, though still likely supplemented by wireless 5G connectivity to more powerful cloud data centers.\n\nSudhakar, Soumya, Vivienne Sze, and Sertac Karaman. 2023. “Data Centers on Wheels: Emissions from Computing Onboard Autonomous Vehicles.” IEEE Micro 43 (1): 29–39. https://doi.org/10.1109/mm.2022.3219803.\nThe bandwidth, storage, and processing capacities required to enable this future technology at scale will depend heavily on advancements in data center infrastructure and AI algorithmic innovations.\n\n\nEnergy Demand\nThe energy demand of data centers can roughly be divided into 4 components—infrastructure, network, storage, and servers. In Figure 16.3, we see that the data infrastructure (which includes cooling, lighting, and controls) and the servers use most of the total energy budget of data centers in the US (Shehabi et al. 2016). This section breaks down the energy demand for the servers and the infrastructure. For the latter, the focus is on cooling systems, as cooling is the dominant factor in energy consumption in the infrastructure.\n\nShehabi, Arman, Sarah Smith, Dale Sartor, Richard Brown, Magnus Herrlin, Jonathan Koomey, Eric Masanet, Nathaniel Horner, Inês Azevedo, and William Lintner. 2016. “United States Data Center Energy Usage Report.”\n\n\n\n\n\n\nFigure 16.3: Data centers energy consumption in the US. Credit: International Energy Agency (IEA).\n\n\n\n\nServers\nThe increase in energy consumption of data centers stems mainly from exponentially growing AI computing requirements. NVIDIA DGX H100 machines that are optimized for deep learning can draw up to 10.2 kW at peak. Leading providers operate data centers with hundreds to thousands of these power-hungry DGX nodes networked to train the latest AI models. For example, the supercomputer developed for OpenAI is a single system with over 285,000 CPU cores, 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server.\nThe intensive computations needed across an entire facility’s densely packed fleet and supporting hardware result in data centers drawing tens of megawatts around the clock. Overall, advancing AI algorithms continue to expand data center energy consumption as more DGX nodes get deployed to keep pace with projected growth in demand for AI compute resources over the coming years.\n\n\nCooling Systems\nTo keep the beefy servers fed at peak capacity and cool, data centers require tremendous cooling capacity to counteract the heat produced by densely packed servers, networking equipment, and other hardware running computationally intensive workloads without pause. With large data centers packing thousands of server racks operating at full tilt, massive industrial-scale cooling towers and chillers are required, using energy amounting to 30-40% of the total data center electricity footprint (Dayarathna, Wen, and Fan 2016). Consequently, companies are looking for alternative methods of cooling. For example, Microsoft’s data center in Ireland leverages a nearby fjord to exchange heat using over half a million gallons of seawater daily.\nRecognizing the importance of energy-efficient cooling, there have been innovations aimed at reducing this energy demand. Techniques like free cooling, which uses outside air or water sources when conditions are favorable, and the use of AI to optimize cooling systems are examples of how the industry adapts. These innovations reduce energy consumption, lower operational costs, and lessen the environmental footprint. However, exponential increases in AI model complexity continue to demand more servers and acceleration hardware operating at higher utilization, translating to rising heat generation and ever greater energy used solely for cooling purposes.\n\n\n\nThe Environmental Impact\nThe environmental impact of data centers is not only caused by the direct energy consumption of the data center itself (Siddik, Shehabi, and Marston 2021). Data center operation involves the supply of treated water to the data center and the discharge of wastewater from the data center. Water and wastewater facilities are major electricity consumers.\n\nSiddik, Md Abu Bakar, Arman Shehabi, and Landon Marston. 2021. “The Environmental Footprint of Data Centers in the United States.” Environ. Res. Lett. 16 (6): 064017. https://doi.org/10.1088/1748-9326/abfba1.\n\nDavis, Jacqueline, Daniel Bizo, Andy Lawrence, Owen Rogers, and Max Smolaks. 2022. “Uptime Institute Global Data Center Survey 2022.” Uptime Institute.\nNext to electricity usage, there are many more aspects to the environmental impacts of these data centers. The water usage of the data centers can lead to water scarcity issues, increased water treatment needs, and proper wastewater discharge infrastructure. Also, raw materials required for construction and network transmission considerably impact environmental t the environment, and components in data centers need to be upgraded and maintained. Where almost 50 percent of servers were refreshed within 3 years of usage, refresh cycles have shown to slow down (Davis et al. 2022). Still, this generates significant e-waste, which can be hard to recycle.\n\n\n\n16.3.3 Energy Optimization\nUltimately, measuring and understanding the energy consumption of AI facilitates optimizing energy consumption.\nOne way to reduce the energy consumption of a given amount of computational work is to run it on more energy-efficient hardware. For instance, TPU chips can be more energy-efficient compared to CPUs when it comes to running large tensor computations for AI, as TPUs can run such computations much faster without drawing significantly more power than CPUs. Another way is to build software systems aware of energy consumption and application characteristics. Good examples are systems works such as Zeus (You, Chung, and Chowdhury 2023) and Perseus (Chung et al. 2023), both of which characterize the tradeoff between computation time and energy consumption at various levels of an ML training system to achieve energy reduction without end-to-end slowdown. In reality, building both energy-efficient hardware and software and combining their benefits should be promising, along with open-source frameworks (e.g., Zeus) that facilitate community efforts.\n\nYou, Jie, Jae-Won Chung, and Mosharaf Chowdhury. 2023. “Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training.” In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), 119–39. Boston, MA: USENIX Association. https://www.usenix.org/conference/nsdi23/presentation/you.\n\nChung, Jae-Won, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2023. “Perseus: Removing Energy Bloat from Large Model Training.” ArXiv Preprint abs/2312.06902. https://arxiv.org/abs/2312.06902.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1749,7 +1749,7 @@ "section": "16.4 Carbon Footprint", "text": "16.4 Carbon Footprint\nThe massive electricity demands of data centers can lead to significant environmental externalities absent an adequate renewable power supply. Many facilities rely heavily on nonrenewable energy sources like coal and natural gas. For example, data centers are estimated to produce up to 2% of total global \\(\\textrm{CO}_2\\) emissions which is closing the gap with the airline industry. As mentioned in previous sections, the computational demands of AI are set to increase. The emissions of this surge are threefold. First, data centers are projected to increase in size (Liu et al. 2020). Secondly, emissions during training are set to increase significantly (Patterson et al. 2022). Thirdly, inference calls to these models are set to increase dramatically.\n\nLiu, Yanan, Xiaoxia Wei, Jinyu Xiao, Zhijie Liu, Yang Xu, and Yun Tian. 2020. “Energy Consumption and Emission Mitigation Prediction Based on Data Center Traffic and PUE for Global Data Centers.” Global Energy Interconnection 3 (3): 272–82. https://doi.org/10.1016/j.gloei.2020.07.008.\nWithout action, this exponential demand growth risks ratcheting up the carbon footprint of data centers further to unsustainable levels. Major providers have pledged carbon neutrality and committed funds to secure clean energy, but progress remains incremental compared to overall industry expansion plans. More radical grid decarbonization policies and renewable energy investments may prove essential to counteracting the climate impact of the coming tide of new data centers aimed at supporting the next generation of AI.\n\n16.4.1 Definition and Significance\nThe concept of a ‘carbon footprint’ has emerged as a key metric. This term refers to the total amount of greenhouse gasses, particularly carbon dioxide, emitted directly or indirectly by an individual, organization, event, or product. These emissions significantly contribute to the greenhouse effect, accelerating global warming and climate change. The carbon footprint is measured in terms of carbon dioxide equivalents (\\(\\textrm{CO}_2\\)e), allowing for a comprehensive account that includes various greenhouse gasses and their relative environmental impact. Examples of this as applied to large-scale ML tasks are shown in Figure 16.4.\n\n\n\n\n\n\nFigure 16.4: Carbon footprint of large-scale ML tasks. Credit: Wu et al. (2022).\n\n\n\nConsidering the carbon footprint is especially important in AI AI’s rapid advancement and integration into various sectors, bringing its environmental impact into sharp focus. AI systems, particularly those involving intensive computations like deep learning and large-scale data processing, are known for their substantial energy demands. This energy, often drawn from power grids, may still predominantly rely on fossil fuels, leading to significant greenhouse gas emissions.\nTake, for example, training large AI models such as GPT-3 or complex neural networks. These processes require immense computational power, typically provided by data centers. The energy consumption associated with operating these centers, particularly for high-intensity tasks, results in notable greenhouse gas emissions. Studies have highlighted that training a single AI model can generate carbon emissions comparable to that of the lifetime emissions of multiple cars, shedding light on the environmental cost of developing advanced AI technologies (Dayarathna, Wen, and Fan 2016). Figure 16.5 shows a comparison from lowest to highest carbon footprints, starting with a roundtrip flight between NY and SF, human life average per year, American life average per year, US car including fuel over a lifetime, and a Transformer model with neural architecture search, which has the highest footprint.\n\n\n\n\n\n\nFigure 16.5: Carbon footprint of NLP model in lbs of \\(\\textrm{CO}_2\\) equivalent. Credit: Dayarathna, Wen, and Fan (2016).\n\n\nDayarathna, Miyuru, Yonggang Wen, and Rui Fan. 2016. “Data Center Energy Consumption Modeling: A Survey.” IEEE Communications Surveys &Amp; Tutorials 18 (1): 732–94. https://doi.org/10.1109/comst.2015.2481183.\n\n\nMoreover, AI’s carbon footprint extends beyond the operational phase. The entire lifecycle of AI systems, including the manufacturing of computing hardware, the energy used in data centers for cooling and maintenance, and the disposal of electronic waste, contributes to their overall carbon footprint. We have discussed some of these aspects earlier, and we will discuss the waste aspects later in this chapter.\n\n\n16.4.2 The Need for Awareness and Action\nUnderstanding the carbon footprint of AI systems is crucial for several reasons. Primarily, it is a step towards mitigating the impacts of climate change. As AI continues to grow and permeate different aspects of our lives, its contribution to global carbon emissions becomes a significant concern. Awareness of these emissions can inform decisions made by developers, businesses, policymakers, and even ML engineers and scientists like us to ensure a balance between technological innovation and environmental responsibility.\nFurthermore, this understanding stimulates the drive towards ‘Green AI’ (R. Schwartz et al. 2020). This approach focuses on developing AI technologies that are efficient, powerful, and environmentally sustainable. It encourages exploring energy-efficient algorithms, using renewable energy sources in data centers, and adopting practices that reduce A. I’m the overall environmental impact.\nIn essence, the carbon footprint is an essential consideration in developing and applying AI technologies. As AI evolves and its applications become more widespread, managing its carbon footprint is key to ensuring that this technological progress aligns with the broader environmental sustainability goals.\n\n\n16.4.3 Estimating the AI Carbon Footprint\nEstimating AI systems’ carbon footprint is critical in understanding their environmental impact. This involves analyzing the various elements contributing to emissions throughout AI technologies’ lifecycle and employing specific methodologies to quantify these emissions accurately. Many different methods for quantifying ML’s carbon emissions have been proposed.\nThe carbon footprint of AI encompasses several key elements, each contributing to the overall environmental impact. First, energy is consumed during the AI model training and operational phases. The source of this energy heavily influences the carbon emissions. Once trained, these models, depending on their application and scale, continue to consume electricity during operation. Next to energy considerations, the hardware used stresses the environment as well.\nThe carbon footprint varies significantly based on the energy sources used. The composition of the sources providing the energy used in the grid varies widely depending on geographical region and even time in a single day! For example, in the USA, roughly 60 percent of the total energy supply is still covered by fossil fuels. Nuclear and renewable energy sources cover the remaining 40 percent. These fractions are not constant throughout the day. As renewable energy production usually relies on environmental factors, such as solar radiation and pressure fields, they do not provide a constant energy source.\nThe variability of renewable energy production has been an ongoing challenge in the widespread use of these sources. Looking at Figure 16.6, which shows data for the European grid, we see that it is supposed to be able to produce the required amount of energy throughout the day. While solar energy peaks in the middle of the day, wind energy shows two distinct peaks in the mornings and evenings. Currently, we rely on fossil and coal-based energy generation methods to supply the lack of energy during times when renewable energy does not meet requirements,\nInnovation in energy storage solutions is required to enable constant use of renewable energy sources. The base energy load is currently met with nuclear energy. This constant energy source does not directly emit carbon emissions but needs to be faster to accommodate the variability of renewable energy sources. Tech companies such as Microsoft have shown interest in nuclear energy sources to power their data centers. As the demand of data centers is more constant than the demand of regular households, nuclear energy could be used as a dominant source of energy.\n\n\n\n\n\n\nFigure 16.6: Energy sources and generation capabilities. Credit: Energy Charts..\n\n\n\nAdditionally, the manufacturing and disposal of AI hardware add to the carbon footprint. Producing specialized computing devices, such as GPUs and CPUs, is energy- and resource-intensive. This phase often relies on energy sources that contribute to greenhouse gas emissions. The electronics industry’s manufacturing process has been identified as one of the eight big supply chains responsible for more than 50 percent of global emissions (Challenge 2021). Furthermore, the end-of-life disposal of this hardware, which can lead to electronic waste, also has environmental implications. As mentioned, servers have a refresh cycle of roughly 3 to 5 years. Of this e-waste, currently only 17.4 percent is properly collected and recycled.. The carbon emissions of this e-waste has shown an increase of more than 50 percent between 2014 and 2020 (Singh and Ogunseitan 2022).\n\nChallenge, WEF Net-Zero. 2021. “The Supply Chain Opportunity.” In World Economic Forum: Geneva, Switzerland.\n\nSingh, Narendra, and Oladele A. Ogunseitan. 2022. “Disentangling the Worldwide Web of e-Waste and Climate Change Co-Benefits.” Circular Economy 1 (2): 100011. https://doi.org/10.1016/j.cec.2022.100011.\nAs is clear from the above, a proper Life Cycle Analysis is necessary to portray all relevant aspects of the emissions caused by AI. Another method is carbon accounting, which quantifies the amount of carbon dioxide emissions directly and indirectly associated with AI operations. This measurement typically uses \\(\\textrm{CO}_2\\) equivalents, allowing for a standardized way of reporting and assessing emissions.\n\n\n\n\n\n\nExercise 16.1: AI’s Carbon Footprint\n\n\n\n\n\nDid you know that the cutting-edge AI models you might use have an environmental impact? This exercise will delve into an AI system’s “carbon footprint.” You’ll learn how data centers’ energy demands, large AI models’ training, and even hardware manufacturing contribute to greenhouse gas emissions. We’ll discuss why it’s crucial to be aware of this impact, and you’ll learn methods to estimate the carbon footprint of your own AI projects. Get ready to explore the intersection of AI and environmental sustainability!", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1760,7 +1760,7 @@ "section": "16.5 Beyond Carbon Footprint", "text": "16.5 Beyond Carbon Footprint\nThe current focus on reducing AI systems’ carbon emissions and energy consumption addresses one crucial aspect of sustainability. However, manufacturing the semiconductors and hardware that enable AI also carries severe environmental impacts that receive comparatively less public attention. Building and operating a leading-edge semiconductor fabrication plant, or “fab,” has substantial resource requirements and polluting byproducts beyond a large carbon footprint.\nFor example, a state-of-the-art fab producing state-of-the-art chips like in 5nm can require up to four million gallons of pure water each day. This water usage approaches what a city of half a million people would require for all needs. Sourcing this consistently places immense strain on local water tables and reservoirs, especially in already water-stressed regions that host many high-tech manufacturing hubs.\nAdditionally, over 250 unique hazardous chemicals are utilized at various stages of semiconductor production within fabs (Mills and Le Hunte 1997). These include volatile solvents like sulfuric acid, nitric acid, and hydrogen fluoride, along with arsine, phosphine, and other highly toxic substances. Preventing the discharge of these chemicals requires extensive safety controls and wastewater treatment infrastructure to avoid soil contamination and risks to surrounding communities. Any improper chemical handling or unanticipated spill carries dire consequences.\n\nMills, Andrew, and Stephen Le Hunte. 1997. “An Overview of Semiconductor Photocatalysis.” J. Photochem. Photobiol., A 108 (1): 1–35. https://doi.org/10.1016/s1010-6030(97)00118-4.\nBeyond water consumption and chemical risks, fab operations also depend on rare metals sourcing, generate tons of dangerous waste products, and can hamper local biodiversity. This section will analyze these critical but less discussed impacts. With vigilance and investment in safety, the harms from semiconductor manufacturing can be contained while still enabling technological progress. However, ignoring these externalized issues will exacerbate ecological damage and health risks over the long run.\n\n16.5.1 Water Usage and Stress\nSemiconductor fabrication is an incredibly water-intensive process. Based on an article from 2009, a typical 300mm silicon wafer requires 8,328 liters of water, of which 5,678 liters is ultrapure water (Cope 2009). Today, a typical fab can use up to four million gallons of pure water. To operate one facility, TSMC’s latest fab in Arizona is projected to use 8.9 million gallons daily or nearly 3 percent of the city’s current water production. To put things in perspective, Intel and Quantis found that over 97% of their direct water consumption is attributed to semiconductor manufacturing operations within their fabrication facilities (Cooper et al. 2011).\n\nCope, Gord. 2009. “Pure Water, Semiconductors and the Recession.” Global Water Intelligence 10 (10).\n\nCooper, Tom, Suzanne Fallender, Joyann Pafumi, Jon Dettling, Sebastien Humbert, and Lindsay Lessard. 2011. “A Semiconductor Company’s Examination of Its Water Footprint Approach.” In Proceedings of the 2011 IEEE International Symposium on Sustainable Systems and Technology, 1–6. IEEE; IEEE. https://doi.org/10.1109/issst.2011.5936865.\nThis water is repeatedly used to flush away contaminants in cleaning steps and also acts as a coolant and carrier fluid in thermal oxidation, chemical deposition, and chemical mechanical planarization processes. During peak summer months, this approximates the daily water consumption of a city with a population of half a million people.\nDespite being located in regions with sufficient water, the intensive usage can severely depress local water tables and drainage basins. For example, the city of Hsinchu in Taiwan suffered sinking water tables and seawater intrusion into aquifers due to excessive pumping to satisfy water supply demands from the Taiwan Semiconductor Manufacturing Company (TSMC) fab. In water-scarce inland areas like Arizona, massive water inputs are needed to support fabs despite already strained reservoirs.\nWater discharge from fabs risks environmental contamination besides depletion if not properly treated. While much discharge is recycled within the fab, the purification systems still filter out metals, acids, and other contaminants that can pollute rivers and lakes if not cautiously handled (Prakash, Callahan, et al. 2023). These factors make managing water usage essential when mitigating wider sustainability impacts.\n\n\n16.5.2 Hazardous Chemicals Usage\nModern semiconductor fabrication involves working with many highly hazardous chemicals under extreme conditions of heat and pressure (Kim et al. 2018). Key chemicals utilized include:\n\nKim, Sunju, Chungsik Yoon, Seunghon Ham, Jihoon Park, Ohun Kwon, Donguk Park, Sangjun Choi, Seungwon Kim, Kwonchul Ha, and Won Kim. 2018. “Chemical Use in the Semiconductor Manufacturing Industry.” Int. J. Occup. Env. Heal. 24 (3-4): 109–18. https://doi.org/10.1080/10773525.2018.1519957.\n\nStrong acids: Hydrofluoric, sulfuric, nitric, and hydrochloric acids rapidly eat through oxides and other surface contaminants but also pose toxicity dangers. Fabs can use thousands of metric tons of these acids annually, and accidental exposure can be fatal for workers.\nSolvents: Key solvents like xylene, methanol, and methyl isobutyl ketone (MIBK) handle dissolving photoresists but have adverse health impacts like skin/eye irritation and narcotic effects if mishandled. They also create explosive and air pollution risks.\nToxic gases: Gas mixtures containing arsine (AsH3), phosphine (PH3), diborane (B2H6), germane (GeH4), etc., are some of the deadliest chemicals used in doping and vapor deposition steps. Minimal exposures can lead to poisoning, tissue damage, and even death without quick treatment.\nChlorinated compounds: Older chemical mechanical planarization formulations incorporated perchloroethylene, trichloroethylene, and other chlorinated solvents, which have since been banned due to their carcinogenic effects and impacts on the ozone layer. However, their prior release still threatens surrounding groundwater sources.\n\nStrict handling protocols, protective equipment for workers, ventilation, filtrating/scrubbing systems, secondary containment tanks, and specialized disposal mechanisms are vital where these chemicals are used to minimize health, explosion, air, and environmental spill dangers (Wald and Jones 1987). But human errors and equipment failures still occasionally occur–highlighting why reducing fab chemical intensities is an ongoing sustainability effort.\n\nWald, Peter H., and Jeffrey R. Jones. 1987. “Semiconductor Manufacturing: An Introduction to Processes and Hazards.” Am. J. Ind. Med. 11 (2): 203–21. https://doi.org/10.1002/ajim.4700110209.\n\n\n16.5.3 Resource Depletion\nWhile silicon forms the base, there is an almost endless supply of silicon on Earth. In fact, silicon is the second most plentiful element found in the Earth’s crust, accounting for 27.7% of the crust’s total mass. Only oxygen exceeds silicon in abundance within the crust. Therefore, silicon is not necessary to consider for resource depletion. However, the various specialty metals and materials that enable the integrated circuit fabrication process and provide specific properties still need to be discovered. Maintaining supplies of these resources is crucial yet threatened by finite availability and geopolitical influences (Nakano 2021).\n\nNakano, Jane. 2021. The Geopolitics of Critical Minerals Supply Chains. JSTOR.\n\nChen, H.-W. 2006. “Gallium, Indium, and Arsenic Pollution of Groundwater from a Semiconductor Manufacturing Area of Taiwan.” B. Environ. Contam. Tox. 77 (2): 289–96. https://doi.org/10.1007/s00128-006-1062-3.\nGallium, indium, and arsenic are vital ingredients in forming ultra-efficient compound semiconductors in the highest-speed chips suited for 5G and AI applications (Chen 2006). However, these rare elements have relatively scarce natural deposits that are being depleted. The United States Geological Survey has indium on its list of most critical at-risk commodities, estimated to have less than a 15-year viable global supply at current demand growth (Davies 2011).\nHelium is required in huge volumes for next-gen fabs to enable precise wafer cooling during operation. But helium’s relative rarity and the fact that once it vents into the atmosphere, it quickly escapes Earth make maintaining helium supplies extremely challenging long-term (Davies 2011). According to the US National Academies, substantial price increases and supply shocks are already occurring in this thinly traded market.\n\nJha, A. R. 2014. Rare Earth Materials: Properties and Applications. CRC Press. https://doi.org/10.1201/b17045.\nOther risks include China’s control over 90% of the rare earth elements critical to semiconductor material production (Jha 2014). Any supply chain issues or trade disputes can lead to catastrophic raw material shortages, given the lack of current alternatives. In conjunction with helium shortages, resolving the limited availability and geographic imbalance in accessing essential ingredients remains a sector priority for sustainability.\n\n\n16.5.4 Hazardous Waste Generation\nSemiconductor fabs generate tons of hazardous waste annually as byproducts from the various chemical processes (Grossman 2007). The key waste streams include:\n\nGrossman, Elizabeth. 2007. High Tech Trash: Digital Devices, Hidden Toxics, and Human Health. Island press.\n\nGaseous waste: Fab ventilation systems capture harmful gases like arsine, phosphine, and germane and filter them out to avoid worker exposure. However, this produces significant quantities of dangerous condensed gas that need specialized treatment.\nVOCs: Volatile organic compounds like xylene, acetone, and methanol are used extensively as photoresist solvents and are evaporated as emissions during baking, etching, and stripping. VOCs pose toxicity issues and require scrubbing systems to prevent release.\nSpent acids: Strong acids such as sulfuric acid, hydrofluoric acid, and nitric acid get depleted in cleaning and etching steps, transforming into a corrosive, toxic soup that can dangerously react, releasing heat and fumes if mixed.\nSludge: Water treatment of discharged effluent contains concentrated heavy metals, acid residues, and chemical contaminants. Filter press systems separate this hazardous sludge.\nFilter cake: Gaseous filtration systems generate multi-ton sticky cakes of dangerous absorbed compounds requiring containment.\n\nWithout proper handling procedures, storage tanks, packaging materials, and secondary containment, improper disposal of any of these waste streams can lead to dangerous spills, explosions, and environmental releases. The massive volumes mean even well-run fabs produce tons of hazardous waste year after year, requiring extensive treatment.\n\n\n16.5.5 Biodiversity Impacts\n\nHabitat Disruption and Fragmentation\nSemiconductor fabs require large, contiguous land areas to accommodate cleanrooms, support facilities, chemical storage, waste treatment, and ancillary infrastructure. Developing these vast built-up spaces inevitably dismantles existing habitats, damaging sensitive biomes that may have taken decades to develop. For example, constructing a new fabrication module may level local forest ecosystems that species, like spotted owls and elk, rely upon for survival. The outright removal of such habitats severely threatens wildlife populations dependent on those lands.\nFurthermore, pipelines, water channels, air and waste exhaust systems, access roads, transmission towers, and other support infrastructure fragment the remaining undisturbed habitats. Animals moving daily for food, water, and spawning can find their migration patterns blocked by these physical human barriers that bisect previously natural corridors.\n\n\nAquatic Life Disturbances\nWith semiconductor fabs consuming millions of gallons of ultra-pure water daily, accessing and discharging such volumes risks altering the suitability of nearby aquatic environments housing fish, water plants, amphibians, and other species. If the fab is tapping groundwater tables as its primary supply source, overdrawing at unsustainable rates can deplete lakes or lead to stream drying as water levels drop (Davies 2011).\n\nDavies, Emma. 2011. “Endangered Elements: Critical Thinking.” https://www.rsc.org/images/Endangered\\%20Elements\\%20-\\%20Critical\\%20Thinking\\_tcm18-196054.pdf.\n\nLeRoy Poff, N, MM Brinson, and JW Day. 2002. “Aquatic Ecosystems & Global Climate Change.” Pew Center on Global Climate Change.\n\nTill, Aaron, Andrew L. Rypel, Andrew Bray, and Samuel B. Fey. 2019. “Fish Die-Offs Are Concurrent with Thermal Extremes in North Temperate Lakes.” Nat. Clim. Change 9 (8): 637–41. https://doi.org/10.1038/s41558-019-0520-y.\nAlso, discharging wastewater at higher temperatures to cool fabrication equipment can shift downstream river conditions through thermal pollution. Temperature changes beyond thresholds that native species evolved for can disrupt reproductive cycles. Warmer water also holds less dissolved oxygen, critical to supporting aquatic plant and animal life (LeRoy Poff, Brinson, and Day 2002). Combined with traces of residual contaminants that escape filtration systems, the discharged water can cumulatively transform environments to be far less habitable for sensitive organisms (Till et al. 2019).\n\n\nAir and Chemical Emissions\nWhile modern semiconductor fabs aim to contain air and chemical discharges through extensive filtration systems, some levels of emissions often persist, raising risks for nearby flora and fauna. Air pollutants can carry downwind, including volatile organic compounds (VOCs), nitrogen oxide compounds (NOx), particulate matter from fab operational exhausts, and power plant fuel emissions.\nAs contaminants permeate local soils and water sources, wildlife ingesting affected food and water ingest toxic substances, which research shows can hamper cell function, reproduction rates, and longevity–slowly poisoning ecosystems (Hsu et al. 2016).\n\nHsu, Liang-Ching, Ching-Yi Huang, Yen-Hsun Chuang, Ho-Wen Chen, Ya-Ting Chan, Heng Yi Teah, Tsan-Yao Chen, Chiung-Fen Chang, Yu-Ting Liu, and Yu-Min Tzou. 2016. “Accumulation of Heavy Metals and Trace Elements in Fluvial Sediments Received Effluents from Traditional and Semiconductor Industries.” Scientific Reports 6 (1): 34250. https://doi.org/10.1038/srep34250.\nLikewise, accidental chemical spills and improper waste handling, which release acids, BODs, and heavy metals into soils, can dramatically affect retention and leeching capabilities. Flora, such as vulnerable native orchids adapted to nutrient-poor substrates, can experience die-offs when contacted by foreign runoff chemicals that alter soil pH and permeability. One analysis found that a single 500-gallon nitric acid spill led to the regional extinction of a rare moss species in the year following when the acidic effluent reached nearby forest habitats. Such contamination events set off chain reactions across the interconnected web of life. Thus, strict protocols are essential to avoid hazardous discharge and runoff.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1771,7 +1771,7 @@ "section": "16.6 Life Cycle Analysis", "text": "16.6 Life Cycle Analysis\nUnderstanding the holistic environmental impact of AI systems requires a comprehensive approach that considers the entire life cycle of these technologies. Life Cycle Analysis (LCA) refers to a methodological framework used to quantify the environmental impacts across all stages in a product or system’s lifespan, from raw material extraction to end-of-life disposal. Applying LCA to AI systems can help identify priority areas to target for reducing overall environmental footprints.\n\n16.6.1 Stages of an AI System’s Life Cycle\nThe life cycle of an AI system can be divided into four key phases:\n\nDesign Phase: This includes the energy and resources used in researching and developing AI technologies. It encompasses the computational resources used for algorithm development and testing contributing to carbon emissions.\nManufacture Phase: This stage involves producing hardware components such as graphics cards, processors, and other computing devices necessary for running AI algorithms. Manufacturing these components often involves significant energy for material extraction, processing, and greenhouse gas emissions.\nUse Phase: The next most energy-intensive phase involves the operational use of AI systems. It includes the electricity consumed in data centers for training and running neural networks and powering end-user applications. This is arguably one of the most carbon-intensive stages.\nDisposal Phase: This final stage covers the end-of-life aspects of AI systems, including the recycling and disposal of electronic waste generated from outdated or non-functional hardware past their usable lifespan.\n\n\n\n16.6.2 Environmental Impact at Each Stage\nDesign and Manufacturing\nThe environmental impact during these beginning-of-life phases includes emissions from energy use and resource depletion from extracting materials for hardware production. At the heart of AI hardware are semiconductors, primarily silicon, used to make the integrated circuits in processors and memory chips. This hardware manufacturing relies on metals like copper for wiring, aluminum for casings, and various plastics and composites for other components. It also uses rare earth metals and specialized alloys- elements like neodymium, terbium, and yttrium- used in small but vital quantities. For example, the creation of GPUs relies on copper and aluminum. At the same time, chips use rare earth metals, which is the mining process that can generate substantial carbon emissions and ecosystem damage.\nUse Phase\nAI computes the majority of emissions in the lifecycle due to continuous high-power consumption, especially for training and running models. This includes direct and indirect emissions from electricity usage and nonrenewable grid energy generation. Studies estimate training complex models can have a carbon footprint comparable to the lifetime emissions of up to five cars.\nDisposal Phase\nThe disposal stage impacts include air and water pollution from toxic materials in devices, challenges associated with complex electronics recycling, and contamination when improperly handled. Harmful compounds from burned e-waste are released into the atmosphere. At the same time, landfill leakage of lead, mercury, and other materials poses risks of soil and groundwater contamination if not properly controlled. Implementing effective electronics recycling is crucial.\n\n\n\n\n\n\nExercise 16.2: Tracking ML Emissions\n\n\n\n\n\nIn this exercise, you’ll delve into the environmental impact of training machine learning models. We’ll use CodeCarbon to track emissions, learn about Life Cycle Analysis (LCA) to understand AI’s carbon footprint, and explore strategies to make your ML model development more environmentally friendly. By the end, you’ll be equipped to track the carbon emissions of your models and start implementing greener practices in your projects.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1782,7 +1782,7 @@ "section": "16.7 Challenges in LCA", "text": "16.7 Challenges in LCA\n\n16.7.1 Lack of Consistency and Standards\nOne major challenge facing life cycle analysis (LCA) for AI systems is the need for consistent methodological standards and frameworks. Unlike product categories like building materials, which have developed international standards for LCA through ISO 14040, there are no firmly established guidelines for analyzing the environmental footprint of complex information technology like AI.\nThis absence of uniformity means researchers make differing assumptions and varying methodological choices. For example, a 2021 study from the University of Massachusetts Amherst (Strubell, Ganesh, and McCallum 2019) analyzed the life cycle emissions of several natural language processing models but only considered computational resource usage for training and omitted hardware manufacturing impacts. A more comprehensive 2020 study from Stanford University researchers included emissions estimates from producing relevant servers, processors, and other components, following an ISO-aligned LCA standard for computer hardware. However, these diverging choices in system boundaries and accounting approaches reduce robustness and prevent apples-to-apples comparisons of results.\nStandardized frameworks and protocols tailored to AI systems’ unique aspects and rapid update cycles would provide more coherence. This could equip researchers and developers to understand environmental hotspots, compare technology options, and accurately track progress on sustainability initiatives across the AI field. Industry groups and international standards bodies like the IEEE or ACM should prioritize addressing this methodological gap.\n\n\n16.7.2 Data Gaps\nAnother key challenge for comprehensive life cycle assessment of AI systems is substantial data gaps, especially regarding upstream supply chain impacts and downstream electronic waste flows. Most existing studies focus narrowly on the learner or usage phase emissions from computational power demands, which misses a significant portion of lifetime emissions (Gupta et al. 2022).\nFor example, little public data from companies exists quantifying energy use and emissions from manufacturing the specialized hardware components that enable AI–including high-end GPUs, ASIC chips, solid-state drives, and more. Researchers often rely on secondary sources or generic industry averages to approximate production impacts. Similarly, on average, there is limited transparency into downstream fate once AI systems are discarded after 4-5 years of usable lifespans.\nWhile electronic waste generation levels can be estimated, specifics on hazardous material leakage, recycling rates, and disposal methods for the complex components are hugely uncertain without better corporate documentation or regulatory reporting requirements.\nThe need for fine-grained data on computational resource consumption for training different model types makes reliable per-parameter or per-query emissions calculations difficult even for the usage phase. Attempts to create lifecycle inventories estimating average energy needs for key AI tasks exist (Henderson et al. 2020; Anthony, Kanding, and Selvan 2020), but variability across hardware setups, algorithms, and input data uncertainty remains extremely high. Furthermore, real-time carbon intensity data, critical in accurately tracking operational carbon footprint, must be improved in many geographic locations, rendering existing tools for operational carbon emission mere approximations based on annual average carbon intensity values.\n\nHenderson, Peter, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, and Joelle Pineau. 2020. “Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning.” The Journal of Machine Learning Research 21 (1): 10039–81.\n\nAnthony, Lasse F. Wolff, Benjamin Kanding, and Raghavendra Selvan. 2020. ICML Workshop on Challenges in Deploying and monitoring Machine Learning Systems.\nThe challenge is that tools like CodeCarbon and ML \\(\\textrm{CO}_2\\) but these are ad hoc approaches at best. Bridging the real data gaps with more rigorous corporate sustainability disclosures and mandated environmental impact reporting will be key for AI’s overall climatic impacts to be understood and managed.\n\n\n16.7.3 Rapid Pace of Evolution\nThe extremely quick evolution of AI systems poses additional challenges in keeping life cycle assessments up-to-date and accounting for the latest hardware and software advancements. The core algorithms, specialized chips, frameworks, and technical infrastructure underpinning AI have all been advancing exceptionally fast, with new developments rapidly rendering prior systems obsolete.\nFor example, in deep learning, novel neural network architectures that achieve significantly better performance on key benchmarks or new optimized hardware like Google’s TPU chips can completely change an “average” model in less than a year. These swift shifts quickly make one-off LCA studies outdated for accurately tracking emissions from designing, running, or disposing of the latest AI.\nHowever, the resources and access required to update LCAs continuously need to be improved. Frequently re-doing labor—and data-intensive life cycle inventories and impact modeling to stay current with AI’s state-of-the-art is likely infeasible for many researchers and organizations. However, updated analyses could notice environmental hotspots as algorithms and silicon chips continue rapidly evolving.\nThis presents difficulty in balancing dynamic precision through continuous assessment with pragmatic constraints. Some researchers have proposed simplified proxy metrics like tracking hardware generations over time or using representative benchmarks as an oscillating set of goalposts for relative comparisons, though granularity may be sacrificed. Overall, the challenge of rapid change will require innovative methodological solutions to prevent underestimating AI’s evolving environmental burdens.\n\n\n16.7.4 Supply Chain Complexity\nFinally, the complex and often opaque supply chains associated with producing the wide array of specialized hardware components that enable AI pose challenges for comprehensive life cycle modeling. State-of-the-art AI relies on cutting-edge advancements in processing chips, graphics cards, data storage, networking equipment, and more. However, tracking emissions and resource use across the tiered networks of globalized suppliers for all these components is extremely difficult.\nFor example, NVIDIA graphics processing units dominate much of the AI computing hardware, but the company relies on several discrete suppliers across Asia and beyond to produce GPUs. Many firms at each supplier tier choose to keep facility-level environmental data private, which could fully enable robust LCAs. Gaining end-to-end transparency down multiple levels of suppliers across disparate geographies with varying disclosure protocols and regulations poses barriers despite being crucial for complete boundary setting. This becomes even more complex when attempting to model emerging hardware accelerators like tensor processing units (TPUs), whose production networks still need to be made public.\nWithout tech giants’ willingness to require and consolidate environmental impact data disclosure from across their global electronics supply chains, considerable uncertainty will remain around quantifying the full lifecycle footprint of AI hardware enablement. More supply chain visibility coupled with standardized sustainability reporting frameworks specifically addressing AI’s complex inputs hold promise for enriching LCAs and prioritizing environmental impact reductions.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1793,7 +1793,7 @@ "section": "16.8 Sustainable Design and Development", "text": "16.8 Sustainable Design and Development\n\n16.8.1 Sustainability Principles\nAs the impact of AI on the environment becomes increasingly evident, the focus on sustainable design and development in AI is gaining prominence. This involves incorporating sustainability principles into AI design, developing energy-efficient models, and integrating these considerations throughout the AI development pipeline. There is a growing need to consider its sustainability implications and develop principles to guide responsible innovation. Below is a core set of principles. The principles flow from the conceptual foundation to practical execution to supporting implementation factors; the principles provide a full cycle perspective on embedding sustainability in AI design and development.\nLifecycle Thinking: Encouraging designers to consider the entire lifecycle of AI systems, from data collection and preprocessing to model development, training, deployment, and monitoring. The goal is to ensure sustainability is considered at each stage. This includes using energy-efficient hardware, prioritizing renewable energy sources, and planning to reuse or recycle retired models.\nFuture Proofing: Designing AI systems anticipating future needs and changes can enhance sustainability. This may involve making models adaptable via transfer learning and modular architectures. It also includes planning capacity for projected increases in operational scale and data volumes.\nEfficiency and Minimalism: This principle focuses on creating AI models that achieve desired results with the least possible resource use. It involves simplifying models and algorithms to reduce computational requirements. Specific techniques include pruning redundant parameters, quantizing and compressing models, and designing efficient model architectures, such as those discussed in the Optimizations chapter.\nLifecycle Assessment (LCA) Integration: Analyzing environmental impacts throughout the development and deployment of lifecycles highlights unsustainable practices early on. Teams can then make adjustments instead of discovering issues late when they are more difficult to address. Integrating this analysis into the standard design flow avoids creating legacy sustainability problems.\nIncentive Alignment: Economic and policy incentives should promote and reward sustainable AI development. These may include government grants, corporate initiatives, industry standards, and academic mandates for sustainability. Aligned incentives enable sustainability to become embedded in AI culture.\nSustainability Metrics and Goals: It is important to establish clearly defined Metrics that measure sustainability factors like carbon usage and energy efficiency. Establishing clear targets for these metrics provides concrete guidelines for teams to develop responsible AI systems. Tracking performance on metrics over time shows progress towards set sustainability goals.\nFairness, Transparency, and Accountability: Sustainable AI systems should be fair, transparent, and accountable. Models should be unbiased, with transparent development processes and mechanisms for auditing and redressing issues. This builds public trust and enables the identification of unsustainable practices.\nCross-disciplinary Collaboration: AI researchers teaming up with environmental scientists and engineers can lead to innovative systems that are high-performing yet environmentally friendly. Combining expertise from different fields from the start of projects enables sustainable thinking to be incorporated into the AI design process.\nEducation and Awareness: Workshops, training programs, and course curricula that cover AI sustainability raise awareness among the next generation of practitioners. This equips students with the knowledge to develop AI that consciously minimizes negative societal and environmental impacts. Instilling these values from the start shapes tomorrow’s professionals and company cultures.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1804,7 +1804,7 @@ "section": "16.9 Green AI Infrastructure", "text": "16.9 Green AI Infrastructure\nGreen AI represents a transformative approach to AI that incorporates environmental sustainability as a fundamental principle across the AI system design and lifecycle (R. Schwartz et al. 2020). This shift is driven by growing awareness of AI technologies’ significant carbon footprint and ecological impact, especially the compute-intensive process of training complex ML models.\n\nSchwartz, Roy, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. “Green AI.” Commun. ACM 63 (12): 54–63. https://doi.org/10.1145/3381831.\nThe essence of Green AI lies in its commitment to align AI advancement with sustainability goals around energy efficiency, renewable energy usage, and waste reduction. The introduction of Green AI ideals reflects maturing responsibility across the tech industry towards environmental stewardship and ethical technology practices. It moves beyond technical optimizations toward holistic life cycle assessment on how AI systems affect sustainability metrics. Setting new bars for ecologically conscious AI paves the way for the harmonious coexistence of technological progress and planetary health.\n\n16.9.1 Energy Efficient AI Systems\nEnergy efficiency in AI systems is a cornerstone of Green AI, aiming to reduce the energy demands traditionally associated with AI development and operations. This shift towards energy-conscious AI practices is vital in addressing the environmental concerns raised by the rapidly expanding field of AI. By focusing on energy efficiency, AI systems can become more sustainable, lessening their environmental impact and paving the way for more responsible AI use.\nAs we discussed earlier, the training and operation of AI models, especially large-scale ones, are known for their high energy consumption, which stems from compute-intensive model architecture and reliance on vast amounts of training data. For example, it is estimated that training a large state-of-the-art neural network model can have a carbon footprint of 284 tonnes—equivalent to the lifetime emissions of 5 cars (Strubell, Ganesh, and McCallum 2019).\n\nStrubell, Emma, Ananya Ganesh, and Andrew McCallum. 2019. “Energy and Policy Considerations for Deep Learning in NLP.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–50. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1355.\nTo tackle the massive energy demands, researchers and developers are actively exploring methods to optimize AI systems for better energy efficiency while maintaining model accuracy and performance. This includes techniques like the ones we have discussed in the model optimizations, efficient AI, and hardware acceleration chapters:\n\nKnowledge distillation to transfer knowledge from large AI models to miniature versions\nQuantization and pruning approaches that reduce computational and space complexities\nLow-precision numerics–lowering mathematical precision without impacting model quality\nSpecialized hardware like TPUs, neuromorphic chips tuned explicitly for efficient AI processing\n\nOne example is Intel’s work on Q8BERT—quantizing the BERT language model with 8-bit integers, leading to a 4x reduction in model size with minimal accuracy loss (Zafrir et al. 2019). The push for energy-efficient AI is not just a technical endeavor–it has tangible real-world implications. More performant systems lower AI’s operational costs and carbon footprint, making it accessible for widespread deployment on mobile and edge devices. It also paves the path toward the democratization of AI and mitigates unfair biases that can emerge from uneven access to computing resources across regions and communities. Pursuing energy-efficient AI is thus crucial for creating an equitable and sustainable future with AI.\n\nZafrir, Ofir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. “Q8BERT: Quantized 8Bit BERT.” In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), 36–39. IEEE; IEEE. https://doi.org/10.1109/emc2-nips53020.2019.00016.\n\n\n16.9.2 Sustainable AI Infrastructure\nSustainable AI infrastructure includes the physical and technological frameworks that support AI systems, focusing on environmental sustainability. This involves designing and operating AI infrastructure to minimize ecological impact, conserve resources, and reduce carbon emissions. The goal is to create a sustainable ecosystem for AI that aligns with broader environmental objectives.\nGreen data centers are central to sustainable AI infrastructure, optimized for energy efficiency, and often powered by renewable energy sources. These data centers employ advanced cooling technologies (Ebrahimi, Jones, and Fleischer 2014), energy-efficient server designs (Uddin and Rahman 2012), and smart management systems (Buyya, Beloglazov, and Abawajy 2010) to reduce power consumption. The shift towards green computing infrastructure also involves adopting energy-efficient hardware, like AI-optimized processors that deliver high performance with lower energy requirements, which we discussed in the AI. Acceleration chapter. These efforts collectively reduce the carbon footprint of running large-scale AI operations.\n\nEbrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. “A Review of Data Center Cooling Technology, Operating Conditions and the Corresponding Low-Grade Waste Heat Recovery Opportunities.” Renewable Sustainable Energy Rev. 31 (March): 622–38. https://doi.org/10.1016/j.rser.2013.12.007.\n\nUddin, Mueen, and Azizah Abdul Rahman. 2012. “Energy Efficiency and Low Carbon Enabler Green IT Framework for Data Centers Considering Green Metrics.” Renewable Sustainable Energy Rev. 16 (6): 4078–94. https://doi.org/10.1016/j.rser.2012.03.014.\n\nBuyya, Rajkumar, Anton Beloglazov, and Jemal Abawajy. 2010. “Energy-Efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges.” https://arxiv.org/abs/1006.0308.\n\nChua, L. 1971. “Memristor-the Missing Circuit Element.” #IEEE_J_CT# 18 (5): 507–19. https://doi.org/10.1109/tct.1971.1083337.\nIntegrating renewable energy sources, such as solar, wind, and hydroelectric power, into AI infrastructure is important for environmental sustainability (Chua 1971). Many tech companies and research institutions are investing in renewable energy projects to power their data centers. This not only helps in making AI operations carbon-neutral but also promotes the wider adoption of clean energy. Using renewable energy sources clearly shows commitment to environmental responsibility in the AI industry.\nSustainability in AI also extends to the materials and hardware used in creating AI systems. This involves choosing environmentally friendly materials, adopting recycling practices, and ensuring responsible electronic waste disposal. Efforts are underway to develop more sustainable hardware components, including energy-efficient chips designed for domain-specific tasks (such as AI accelerators) and environmentally friendly materials in device manufacturing (Cenci et al. 2021; Irimia-Vladu 2014). The lifecycle of these components is also a focus, with initiatives aimed at extending the lifespan of hardware and promoting recycling and reuse.\n\nCenci, Marcelo Pilotto, Tatiana Scarazzato, Daniel Dotto Munchen, Paula Cristina Dartora, Hugo Marcelo Veit, Andrea Moura Bernardes, and Pablo R. Dias. 2021. “Eco-Friendly ElectronicsA Comprehensive Review.” Adv. Mater. Technol. 7 (2): 2001263. https://doi.org/10.1002/admt.202001263.\n\nIrimia-Vladu, Mihai. 2014. ““Green” Electronics: Biodegradable and Biocompatible Materials and Devices for Sustainable Future.” Chem. Soc. Rev. 43 (2): 588–610. https://doi.org/10.1039/c3cs60235d.\nWhile strides are being made in sustainable AI infrastructure, challenges remain, such as the high costs of green technology and the need for global standards in sustainable practices. Future directions include more widespread adoption of green energy, further innovations in energy-efficient hardware, and international collaboration on sustainable AI policies. Pursuing sustainable AI infrastructure is not just a technical endeavor but a holistic approach that encompasses environmental, economic, and social aspects, ensuring that AI advances harmoniously with our planet’s health.\n\n\n16.9.3 Frameworks and Tools\nAccess to the right frameworks and tools is essential to effectively implementing green AI practices. These resources are designed to assist developers and researchers in creating more energy-efficient and environmentally friendly AI systems. They range from software libraries optimized for low-power consumption to platforms that facilitate the development of sustainable AI applications.\nSeveral software libraries and development environments are specifically tailored for Green AI. These tools often include features for optimizing AI models to reduce their computational load and, consequently, their energy consumption. For example, libraries in PyTorch and TensorFlow that support model pruning, quantization, and efficient neural network architectures enable developers to build AI systems that require less processing power and energy. Additionally, open-source communities like the Green Carbon Foundation are creating a centralized carbon intensity metric and building software for carbon-aware computing.\nEnergy monitoring tools are crucial for Green AI, as they allow developers to measure and analyze the energy consumption of their AI systems. By providing detailed insights into where and how energy is being used, these tools enable developers to make informed decisions about optimizing their models for better energy efficiency. This can involve adjustments in algorithm design, hardware selection, cloud computing software selection, or operational parameters. Figure 16.7 is a screenshot of an energy consumption dashboard provided by Microsoft’s cloud services platform.\n\n\n\n\n\n\nFigure 16.7: Microsoft Azure energy consumption dashboard. Credit: Will Buchanan.\n\n\n\nWith the increasing integration of renewable energy sources in AI operations, frameworks facilitating this process are becoming more important. These frameworks help manage the energy supply from renewable sources like solar or wind power, ensuring that AI systems can operate efficiently with fluctuating energy inputs.\nBeyond energy efficiency, sustainability assessment tools help evaluate the broader environmental impact of AI systems. These tools can analyze factors like the carbon footprint of AI operations, the lifecycle impact of hardware components (Gupta et al. 2022), and the overall sustainability of AI projects (Prakash, Callahan, et al. 2023).\n\nGupta, Udit, Mariam Elgamal, Gage Hills, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, and Carole-Jean Wu. 2022. “Act: Designing Sustainable Computer Systems with an Architectural Carbon Modeling Tool.” In Proceedings of the 49th Annual International Symposium on Computer Architecture, 784–99. ACM. https://doi.org/10.1145/3470496.3527408.\n\nPrakash, Shvetank, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan V. Green, Pete Warden, Tim Ansell, and Vijay Janapa Reddi. 2023. “CFU Playground: Full-stack Open-Source Framework for Tiny Machine Learning (TinyML) Acceleration on FPGAs.” In 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Vol. abs/2201.01863. IEEE. https://doi.org/10.1109/ispass57527.2023.00024.\nThe availability and ongoing development of Green AI frameworks and tools are critical for advancing sustainable AI practices. By providing the necessary resources for developers and researchers, these tools facilitate the creation of more environmentally friendly AI systems and encourage a broader shift towards sustainability in the tech community. As Green AI continues to evolve, these frameworks and tools will play a vital role in shaping a more sustainable future for AI.\n\n\n16.9.4 Benchmarks and Leaderboards\nBenchmarks and leaderboards are important for driving progress in Green AI, as they provide standardized ways to measure and compare different methods. Well-designed benchmarks that capture relevant metrics around energy efficiency, carbon emissions, and other sustainability factors enable the community to track advancements fairly and meaningfully.\nExtensive benchmarks exist for tracking AI model performance, such as those extensively discussed in the Benchmarking chapter. Still, a clear and pressing need exists for additional standardized benchmarks focused on sustainability metrics like energy efficiency, carbon emissions, and overall ecological impact. Understanding the environmental costs of AI currently needs to be improved by a lack of transparency and standardized measurement around these factors.\nEmerging efforts such as the ML.ENERGY Leaderboard, which provides performance and energy consumption benchmarking results for large language models (LLMs) text generation, assists in enhancing the understanding of the energy cost of GenAI deployment.\nAs with any benchmark, Green AI benchmarks must represent realistic usage scenarios and workloads. Benchmarks that focus narrowly on easily gamed metrics may lead to short-term gains but fail to reflect actual production environments where more holistic efficiency and sustainability measures are needed. The community should continue expanding benchmarks to cover diverse use cases.\nWider adoption of common benchmark suites by industry players will accelerate innovation in Green AI by allowing easier comparison of techniques across organizations. Shared benchmarks lower the barrier to demonstrating the sustainability benefits of new tools and best practices. However, when designing industry-wide benchmarks, care must be taken around issues like intellectual property, privacy, and commercial sensitivity. Initiatives to develop open reference datasets for Green AI evaluation may help drive broader participation.\nAs methods and infrastructure for Green AI continue maturing, the community must revisit benchmark design to ensure existing suites capture new techniques and scenarios well. Tracking the evolving landscape through regular benchmark updates and reviews will be important to maintain representative comparisons over time. Community efforts for benchmark curation can enable sustainable benchmark suites that stand the test of time. Comprehensive benchmark suites owned by research communities or neutral third parties like MLCommons may encourage wider participation and standardization.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1815,7 +1815,7 @@ "section": "16.10 Case Study: Google’s 4Ms", "text": "16.10 Case Study: Google’s 4Ms\nOver the past decade, AI has rapidly moved from academic research to large-scale production systems powering numerous Google products and services. As AI models and workloads have grown exponentially in size and computational demands, concerns have emerged about their energy consumption and carbon footprint. Some researchers predicted runaway growth in ML’s energy appetite that could outweigh efficiencies gained from improved algorithms and hardware (Thompson et al. 2021).\n\nThompson, Neil C., Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso. 2021. “Deep Learning’s Diminishing Returns: The Cost of Improvement Is Becoming Unsustainable.” IEEE Spectr. 58 (10): 50–55. https://doi.org/10.1109/mspec.2021.9563954.\nHowever, Google’s production data reveals a different story—AI represents a steady 10-15% of total company energy usage from 2019 to 2021. This case study analyzes how Google applied a systematic approach leveraging four best practices—what they term the “4 Ms” of model efficiency, machine optimization, mechanization through cloud computing, and mapping to green locations—to bend the curve on emissions from AI workloads.\nThe scale of Google’s AI usage makes it an ideal case study. In 2021 alone, the company trained models like the 1.2 trillion-parameter GLam model. Analyzing how the application of AI has been paired with rapid efficiency gains in this environment helps us by providing a logical blueprint for the broader AI field to follow.\nBy transparently publishing detailed energy usage statistics, adopting rates of carbon-free clouds and renewables purchases, and more, alongside its technical innovations, Google has enabled outside researchers to measure progress accurately. Their study in the ACM CACM (Patterson et al. 2022) highlights how the company’s multipronged approach shows that runaway AI energy consumption predictions can be overcome by focusing engineering efforts on sustainable development patterns. The pace of improvements also suggests ML’s efficiency gains are just starting.\n\nPatterson, David, Joseph Gonzalez, Urs Holzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and Jeff Dean. 2022. “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink.” Computer 55 (7): 18–28. https://doi.org/10.1109/mc.2022.3148714.\n\n16.10.1 Google’s 4M Best Practices\nTo curb emissions from their rapidly expanding AI workloads, Google engineers systematically identified four best practice areas–termed the “4 Ms”–where optimizations could compound to reduce the carbon footprint of ML:\n\nModel - Selecting efficient AI model architectures can reduce computation by 5-10X with no loss in model quality. Google has extensively researched developing sparse models and neural architecture search to create more efficient models like the Evolved Transformer and Primer.\nMachine—Using hardware optimized for AI over general-purpose systems improves performance per watt by 2-5X. Google’s Tensor Processing Units (TPUs) led to 5-13X better carbon efficiency versus GPUs not optimized for ML.\nMechanization—By leveraging cloud computing systems tailored for high utilization over conventional on-premise data centers, energy costs are reduced by 1.4-2X. Google cites its data center’s power usage effectiveness as outpacing industry averages.\nMap - Choosing data center locations with low-carbon electricity reduces gross emissions by another 5-10X. Google provides real-time maps highlighting the percentage of renewable energy used by its facilities.\n\nTogether, these practices created drastic compound efficiency gains. For example, optimizing the Transformer AI model on TPUs in a sustainable data center location cut energy use by 83. It lowered \\(\\textrm{CO}_2\\) emissions by a factor of 747.\n\n\n16.10.2 Significant Results\nDespite exponential growth in AI adoption across products and services, Google’s efforts to improve the carbon efficiency of ML have produced measurable gains, helping to restrain overall energy appetite. One key data point highlighting this progress is that AI workloads have remained a steady 10% to 15% of total company energy use from 2019 to 2021. As AI became integral to more Google offerings, overall compute cycles dedicated to AI grew substantially. However, efficiencies in algorithms, specialized hardware, data center design, and flexible geography allowed sustainability to keep pace—with AI representing just a fraction of total data center electricity over years of expansion.\nOther case studies underscore how an engineering focus on sustainable AI development patterns enabled rapid quality improvements in lockstep with environmental gains. For example, the natural language processing model GPT-3 was viewed as state-of-the-art in mid-2020. Yet its successor GLaM improved accuracy while cutting training compute needs and using cleaner data center energy–cutting CO2 emissions by a factor of 14 in just 18 months of model evolution.\nSimilarly, Google found past published speculation missing the mark on ML’s energy appetite by factors of 100 to 100,000X due to a lack of real-world metrics. By transparently tracking optimization impact, Google hoped to motivate efficiency while preventing overestimated extrapolations about ML’s environmental toll.\nThese data-driven case studies show how companies like Google are steering AI advancements toward sustainable trajectories and improving efficiency to outpace adoption growth. With further efforts around lifecycle analysis, inference optimization, and renewable expansion, companies can aim to accelerate progress, giving evidence that ML’s clean potential is only just being unlocked by current gains.\n\n\n16.10.3 Further Improvements\nWhile Google has made measurable progress in restraining the carbon footprint of its AI operations, the company recognizes further efficiency gains will be vital for responsible innovation given the technology’s ongoing expansion.\nOne area of focus is showing how advances are often incorrectly viewed as increasing unsustainable computing—like neural architecture search (NAS) to find optimized models— spur downstream savings, outweighing their upfront costs. Despite expending more energy on model discovery rather than hand-engineering, NAS cuts lifetime emissions by producing efficient designs callable across countless applications.\nAdditionally, the analysis reveals that focusing sustainability efforts on data center and server-side optimization makes sense, given the dominant energy draw versus consumer devices. Though Google aims to shrink inference impacts across processors like mobile phones, priority rests on improving training cycles and data center renewables procurement for maximal effect.\nTo that end, Google’s progress in pooling computing inefficiently designed cloud facilities highlights the value of scale and centralization. As more workloads shift away from inefficient on-premise servers, internet giants’ prioritization of renewable energy—with Google and Facebook matched 100% by renewables since 2017 and 2020, respectively—unlocks compounding emissions cuts.\nTogether, these efforts emphasize that while no resting on laurels is possible, Google’s multipronged approach shows that AI efficiency improvements are only accelerating. Cross-domain initiatives around lifecycle assessment, carbon-conscious development patterns, transparency, and matching rising AI demand with clean electricity supply pave a path toward bending the curve further as adoption grows. The company’s results compel the broader field towards replicating these integrated sustainability pursuits.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1826,7 +1826,7 @@ "section": "16.11 Embedded AI - Internet of Trash", "text": "16.11 Embedded AI - Internet of Trash\nWhile much attention has focused on making the immense data centers powering AI more sustainable, an equally pressing concern is the movement of AI capabilities into smart edge devices and endpoints. Edge/embedded AI allows near real-time responsiveness without connectivity dependencies. It also reduces transmission bandwidth needs. However, the increase of tiny devices leads to other risks.\nTiny computers, microcontrollers, and custom ASICs powering edge intelligence face size, cost, and power limitations that rule out high-end GPUs used in data centers. Instead, they require optimized algorithms and extremely compact, energy-efficient circuitry to run smoothly. However, engineering for these microscopic form factors opens up risks around planned obsolescence, disposability, and waste. Figure 16.8 shows that the number of IoT devices is projected to reach 30 billion connected devices by 2030.\n\n\n\n\n\n\nFigure 16.8: Number of Internet of Things (IoT) connected devices worldwide from 2019 to 2023. Credit: Statista.\n\n\n\nEnd-of-life handling of internet-connected gadgets embedded with sensors and AI remains an often overlooked issue during design. However, these products permeate consumer goods, vehicles, public infrastructure, industrial equipment, and more.\n\nE-waste\nElectronic waste, or e-waste, refers to discarded electrical equipment and components that enter the waste stream. This includes devices that have to be plugged in, have a battery, or electrical circuitry. With the rising adoption of internet-connected smart devices and sensors, e-waste volumes rapidly increase yearly. These proliferating gadgets contain toxic heavy metals like lead, mercury, and cadmium that become environmental and health hazards when improperly disposed of.\nThe amount of electronic waste being produced is growing at an alarming rate. Today, we already produce 50 million tons per year. By 2030, that figure is projected to jump to a staggering 75 million tons as consumer electronics consumption continues to accelerate. Global e-waste production will reach 120 million tonnes annually by 2050 (Un and Forum 2019). The soaring production and short lifecycles of our gadgets fuel this crisis, from smartphones and tablets to internet-connected devices and home appliances.\nDeveloping nations are being hit the hardest as they need more infrastructure to process obsolete electronics safely. In 2019, formal e-waste recycling rates in poorer countries ranged from 13% to 23%. The remainder ends up illegally dumped, burned, or crudely dismantled, releasing toxic materials into the environment and harming workers and local communities. Clearly, more needs to be done to build global capacity for ethical and sustainable e-waste management, or we risk irreversible damage.\nThe danger is that crude handling of electronics to strip valuables exposes marginalized workers and communities to noxious burnt plastics/metals. Lead poisoning poses especially high risks to child development if ingested or inhaled. Overall, only about 20% of e-waste produced was collected using environmentally sound methods, according to UN estimates (Un and Forum 2019). So solutions for responsible lifecycle management are urgently required to contain the unsafe disposal as volume soars higher.\n\nUn, and World Economic Forum. 2019. A New Circular Vision for Electronics, Time for a Global Reboot. PACE - Platform for Accelerating the Circular Economy. https://www3.weforum.org/docs/WEF\\_A\\_New\\_Circular\\_Vision\\_for\\_Electronics.pdf.\n\n\nDisposable Electronics\nThe rapidly falling costs of microcontrollers, tiny rechargeable batteries, and compact communication hardware have enabled the embedding of intelligent sensor systems throughout everyday consumer goods. These internet-of-things (IoT) devices monitor product conditions, user interactions, and environmental factors to enable real-time responsiveness, personalization, and data-driven business decisions in the evolving connected marketplace.\nHowever, these embedded electronics face little oversight or planning around sustainably handling their eventual disposal once the often plastic-encased products are discarded after brief lifetimes. IoT sensors now commonly reside in single-use items like water bottles, food packaging, prescription bottles, and cosmetic containers that overwhelmingly enter landfill waste streams after a few weeks to months of consumer use.\nThe problem accelerates as more manufacturers rush to integrate mobile chips, power sources, Bluetooth modules, and other modern silicon ICs, costing under US$1, into various merchandise without protocols for recycling, replacing batteries, or component reusability. Despite their small individual size, the volumes of these devices and lifetime waste burden loom large. Unlike regulating larger electronics, few policy constraints exist around materials requirements or toxicity in tiny disposable gadgets.\nWhile offering convenience when working, the unsustainable combination of difficult retrievability and limited safe breakdown mechanisms causes disposable connected devices to contribute outsized shares of future e-waste volumes needing urgent attention.\n\n\nPlanned Obsolescence\nPlanned obsolescence refers to the intentional design strategy of manufacturing products with artificially limited lifetimes that quickly become non-functional or outdated. This spurs faster replacement purchase cycles as consumers find devices no longer meet their needs within a few years. However, electronics designed for premature obsolescence contribute to unsustainable e-waste volumes.\nFor example, gluing smartphone batteries and components together hinders repairability compared to modular, accessible assemblies. Rolling out software updates that deliberately slow system performance creates a perception that upgrading devices produced only several years earlier is worth it.\nLikewise, fashionable introductions of new product generations with minor but exclusive feature additions make prior versions rapidly seem dated. These tactics compel buying new gadgets (e.g., iPhones) long before operational endpoints. When multiplied across fast-paced electronics categories, billions of barely worn items are discarded annually.\nPlanned obsolescence thus intensifies resource utilization and waste creation in making products with no intention for long lifetimes. This contradicts sustainability principles around durability, reuse, and material conservation. While stimulating continuous sales and gains for manufacturers in the short term, the strategy externalizes environmental costs and toxins onto communities lacking proper e-waste processing infrastructure.\nPolicy and consumer action are crucial to counter gadget designs that are needlessly disposable by default. Companies should also invest in product stewardship programs supporting responsible reuse and reclamation.\nConsider the real-world example. Apple has faced scrutiny over the years for allegedly engaging in planned obsolescence to encourage customers to buy new iPhone models. The company allegedly designed its phones so that performance degrades over time or existing features become incompatible with new operating systems, which critics argue is meant to spur more rapid upgrade cycles. In 2020, Apple paid a 25 million Euros fine to settle a case in France where regulators found the company guilty of intentionally slowing down older iPhones without clearly informing customers via iOS updates.\nBy failing to be transparent about power management changes that reduced device performance, Apple participated in deceptive activities that reduced product lifespan to drive sales. The company claimed it was done to “smooth out” peaks that could suddenly cause older batteries to shut down. However, this example highlights the legal risks around employing planned obsolescence and not properly disclosing when functionality changes impact device usability over time- even leading brands like Apple can run into trouble if perceived as intentionally shortening product life cycles.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1837,7 +1837,7 @@ "section": "16.12 Policy and Regulatory Considerations", "text": "16.12 Policy and Regulatory Considerations\n\n16.12.1 Measurement and Reporting Mandates\nOne policy mechanism that is increasingly relevant for AI systems is measurement and reporting requirements regarding energy consumption and carbon emissions. Mandated metering, auditing, disclosures, and more rigorous methodologies aligned to sustainability metrics can help address information gaps hindering efficiency optimizations.\nSimultaneously, national or regional policies require companies above a certain size to utilize AI in their products or backend systems to report energy consumption or emissions associated with major AI workloads. Organizations like the Partnership on AI, IEEE, and NIST could help shape standardized methodologies. More complex proposals involve defining consistent ways to measure computational complexity, data center PUE, carbon intensity of energy supply, and efficiencies gained through AI-specific hardware.\nReporting obligations for public sector users procuring AI services—such as through proposed legislation in Europe—could also increase transparency. However, regulators must balance the additional measurement burden such mandates place on organizations against ongoing carbon reductions from ingraining sustainability-conscious development patterns.\nTo be most constructive, any measurement and reporting policies should focus on enabling continuous refinement rather than simplistic restrictions or caps. As AI advancements unfold rapidly, nimble governance guardrails that embed sustainability considerations into normal evaluation metrics can motivate positive change. However, overprescription risks constraining innovation if requirements grow outdated. AI efficiency policy aims to accelerate progress industry-wide by combining flexibility with appropriate transparency guardrails.\n\n\n16.12.2 Restriction Mechanisms\nIn addition to reporting mandates, policymakers have several restriction mechanisms that could directly shape how AI systems are developed and deployed to curb emissions:\nCaps on Computing Emissions: The European Commission’s proposed AI Act takes a horizontal approach that could allow setting economy-wide caps on the volume of computing power available for training AI models. Like emissions trading systems, caps aim to disincentivize extensive computing over sustainability indirectly. However, model quality could be improved to provide more pathways for procuring additional capacity.\nConditioning Access to Public Resources: Some experts have proposed incentives like only allowing access to public datasets or computing power for developing fundamentally efficient models rather than extravagant architectures. For example, the MLCommons benchmarking consortium founded by major tech firms could formally integrate efficiency into its standardized leaderboard metrics—however, conditioned access risks limiting innovation.\nFinancial Mechanisms: Analogous to carbon taxes on polluting industries, fees applied per unit of AI-related compute consumption could discourage unnecessary model scaling while funding efficiency innovations. Tax credits could alternatively reward organizations pioneering more accurate but compact AI techniques. However, financial tools require careful calibration between revenue generation and fairness and not over-penalizing productive uses of AI.\nTechnology Bans: If measurement consistently pinned extreme emissions on specific applications of AI without paths for remediation, outright bans present a tool of last resort for policymakers. However, given AI’s dual use, defining harmful versus beneficial deployments proves complex, necessitating holistic impact assessment before concluding no redeeming value exists. Banning promising technologies risks unintended consequences and requires caution.\n\n\n16.12.3 Government Incentives\nIt is a common practice for governments to provide tax or other incentives to consumers or businesses when contributing to more sustainable technological practices. Such incentives already exist in the US for adopting solar panels or energy-efficient buildings. To the best of our knowledge, no such tax incentives exist for AI-specific development practices yet.\nAnother potential incentive program that is beginning to be explored is using government grants to fund Green AI projects. For example, in Spain, 300 million euros have been allocated to specifically fund projects in AI and sustainability. Government incentives are a promising avenue to encourage sustainable business and consumer behavior practices, but careful thought is required to determine how those incentives will fit into market demands (Cohen, Lobel, and Perakis 2016).\n\nCohen, Maxime C., Ruben Lobel, and Georgia Perakis. 2016. “The Impact of Demand Uncertainty on Consumer Subsidies for Green Technology Adoption.” Manage. Sci. 62 (5): 1235–58. https://doi.org/10.1287/mnsc.2015.2173.\n\n\n16.12.4 Self-Regulation\nComplimentary to potential government action, voluntary self-governance mechanisms allow the AI community to pursue sustainability ends without top-down intervention:\nRenewables Commitments: Large AI practitioners like Google, Microsoft, Amazon, and Facebook have pledged to procure enough renewable electricity to match 100% of their energy demands. These commitments unlock compounding emissions cuts as compute scales up. Formalizing such programs incentivizes green data center regions. However, there are critiques on whether these pledges are enough (Monyei and Jenkins 2018).\n\nMonyei, Chukwuka G., and Kirsten E. H. Jenkins. 2018. “Electrons Have No Identity: Setting Right Misrepresentations in Google and Apple’s Clean Energy Purchasing.” Energy Research &Amp; Social Science 46 (December): 48–51. https://doi.org/10.1016/j.erss.2018.06.015.\nInternal Carbon Prices: Some organizations utilize shadow prices on carbon emissions to represent environmental costs in capital allocation decisions between AI projects. If modeled effectively, theoretical charges on development carbon footprints steer funding toward efficient innovations rather than solely accuracy gains.\nEfficiency Development Checklists: Groups like the AI Sustainability Coalition suggest voluntary checklist templates highlighting model design choices, hardware configurations, and other factors architects can tune per application to restrain emissions. Organizations can drive change by ingraining sustainability as a primary success metric alongside accuracy and cost.\nIndependent Auditing: Even absent public disclosure mandates, firms specializing in technology sustainability audits help AI developers identify waste, create efficiency roadmaps, and benchmark progress via impartial reviews. Structuring such audits into internal governance procedures or the procurement process expands accountability.\n\n\n16.12.5 Global Considerations\nWhile measurement, restrictions, incentives, and self-regulation represent potential policy mechanisms for furthering AI sustainability, fragmentation across national regimes risks unintended consequences. As with other technology policy domains, divergence between regions must be carefully managed.\nFor example, due to regional data privacy concerns, OpenAI barred European users from accessing its viral ChatGPT chatbot. This came after the EU’s proposed AI Act signaled a precautionary approach, allowing the EC to ban certain high-risk AI uses and enforcing transparency rules that create uncertainty for releasing brand new models. However, it would be wise to caution against regulator action as it could inadvertently limit European innovation if regimes with lighter-touch regulation attract more private-sector AI research spending and talent. Finding common ground is key.\nThe OECD principles on AI and the United Nations frameworks underscore universally agreed-upon tenets all national policies should uphold: transparency, accountability, bias mitigation, and more. Constructively embedding sustainability as a core principle for responsible AI within international guidance can motivate unified action without sacrificing flexibility across divergent legal systems. Avoiding race-to-the-bottom dynamics hinges on enlightened multilateral cooperation.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1848,7 +1848,7 @@ "section": "16.13 Public Perception and Engagement", "text": "16.13 Public Perception and Engagement\nAs societal attention and policy efforts aimed at environmental sustainability ramp up worldwide, there is growing enthusiasm for leveraging AI to help address ecological challenges. However, public understanding and attitudes toward the role of AI systems in sustainability contexts still need to be clarified and clouded by misconceptions. On the one hand, people hope advanced algorithms can provide new solutions for green energy, responsible consumption, decarbonization pathways, and ecosystem preservation. On the other, fears regarding the risks of uncontrolled AI also seep into the environmental domain and undermine constructive discourse. Furthermore, a lack of public awareness on key issues like transparency in developing sustainability-focused AI tools and potential biases in data or modeling also threaten to limit inclusive participation and degrade public trust.\nTackling complex, interdisciplinary priorities like environmental sustainability requires informed, nuanced public engagement and responsible advances in AI innovation. The path forward demands careful, equitable collaborative efforts between experts in ML, climate science, environmental policy, social science, and communication. Mapping the landscape of public perceptions, identifying pitfalls, and charting strategies to cultivate understandable, accessible, and trustworthy AI systems targeting shared ecological priorities will prove essential to realizing sustainability goals. This complex terrain warrants a deep examination of the sociotechnical dynamics involved.\n\n16.13.1 AI Awareness\nIn May 2022, the Pew Research Center polled 5,101 US adults, finding 60% had heard or read “a little” about AI while 27% heard “a lot”–indicating decent broad recognition, but likely limited comprehension about details or applications. However, among those with some AI familiarity, concerns emerge regarding risks of personal data misuse according to agreed terms. Still, 62% felt AI could ease modern life if applied responsibly. Yet, a specific understanding of sustainability contexts still needs to be improved.\nStudies attempting to categorize online discourse sentiments find a nearly even split between optimism and caution regarding deploying AI for sustainability goals. Factors driving positivity include hopes around better forecasting of ecological shifts using ML models. Negativity arises from a lack of confidence in self-supervised algorithms avoiding unintended consequences due to unpredictable human impacts on complex natural systems during training.\nThe most prevalent public belief remains that while AI does harbor the potential for accelerating solutions on issues like emission reductions and wildlife protections, inadequate safeguarding around data biases, ethical blindspots, and privacy considerations could be more appreciated risks if pursued carelessly, especially at scale. This leads to hesitancy around unconditional support without evidence of deliberate, democratically guided development.\n\n\n16.13.2 Messaging\nOptimistic efforts are highlighting AI’s sustainability promise and emphasize the potential for advanced ML to radically accelerate decarbonization effects from smart grids, personalized carbon tracking apps, automated building efficiency optimizations, and predictive analytics guiding targeted conservation efforts. More comprehensive real-time modeling of complex climate and ecological shifts using self-improving algorithms offers hope for mitigating biodiversity losses and averting worst-case scenarios.\nHowever, cautionary perspectives, such as the Asilomar AI Principles, question whether AI itself could exacerbate sustainability challenges if improperly constrained. The rising energy demands of large-scale computing systems and the increasingly massive neural network model training conflict with clean energy ambitions. Lack of diversity in data inputs or developers’ priorities may downplay urgent environmental justice considerations. Near-term skeptical public engagement likely hinges on a need for perceivable safeguards against uncontrolled AI systems running amok on core ecological processes.\nIn essence, polarized framings either promote AI as an indispensable tool for sustainability problem-solving–if compassionately directed toward people and the planet–or present AI as an amplifier of existing harms insidiously dominating hidden facets of natural systems central to all life. Overcoming such impasses demands balancing honest trade-off discussions with shared visions for equitable, democratically governed technological progress targeting restoration.\n\n\n16.13.3 Equitable Participation\nEnsuring equitable participation and access should form a cornerstone of any sustainability initiative with the potential for major societal impacts. This principle applies equally to AI systems targeting environmental goals. However, commonly excluded voices like frontline, rural, or indigenous communities and future generations not present to consent could suffer disproportionate consequences from technology transformations. For instance, the Partnership on AI has launched events expressly targeting input from marginalized communities on deploying AI responsibly.\nEnsuring equitable access and participation should form a cornerstone of any sustainability initiative with the potential for major societal impacts, whether AI or otherwise. However, inclusive engagement in environmental AI relies partly on the availability and understanding of fundamental computing resources. As the recent OECD report on National AI Compute Capacity highlights (Oecd 2023), many countries currently lack data or strategic plans mapping needs for the infrastructure required to fuel AI systems. This policy blindspot could constrain economic goals and exacerbate barriers to entry for marginalized populations. Their blueprint urges developing national AI compute capacity strategies along dimensions of capacity, accessibility, innovation pipelines, and resilience to anchor innovation. The underlying data storage needs to be improved, and model development platforms or specialized hardware could inadvertently concentrate AI progress in the hands of select groups. Therefore, planning for a balanced expansion of fundamental AI computing resources via policy initiatives ties directly to hopes for democratized sustainability problem-solving using equitable and transparent ML tools.\n\nOecd. 2023. “A Blueprint for Building National Compute Capacity for Artificial Intelligence.” 350. Organisation for Economic Co-Operation; Development (OECD). https://doi.org/10.1787/876367e3-en.\nThe key idea is that equitable participation in AI systems targeting environmental challenges relies in part on ensuring the underlying computing capacity and infrastructure are correct, which requires proactive policy planning from a national perspective.\n\n\n16.13.4 Transparency\nAs public sector agencies and private companies alike rush towards adopting AI tools to help tackle pressing environmental challenges, calls for transparency around these systems’ development and functionality have begun to amplify. Explainable and interpretable ML features grow more crucial for building trust in emerging models aiming to guide consequential sustainability policies. Initiatives like the Montreal Carbon Pledge brought tech leaders together to commit to publishing impact assessments before launching environmental systems, as pledged below:\n*“As institutional investors, we must act in the best long-term interests of our beneficiaries. In this fiduciary role, long-term investment risks are associated with greenhouse gas emissions, climate change, and carbon regulation.\nMeasuring our carbon footprint is integral to understanding better, quantifying, and managing the carbon and climate change-related impacts, risks, and opportunities in our investments. Therefore, as a first step, we commit to measuring and disclosing the carbon footprint of our investments annually to use this information to develop an engagement strategy and identify and set carbon footprint reduction targets.”*\nWe need a similar pledge for AI sustainability and responsibility. Widespread acceptance and impact of AI sustainability solutions will partly be on deliberate communication of validation schemes, metrics, and layers of human judgment applied before live deployment. Efforts like NIST’s Principles for Explainable AI can help foster transparency into AI systems. The National Institute of Standards and Technology (NIST) has published an influential set of guidelines dubbed the Principles for Explainable AI (Phillips et al. 2020). This framework articulates best practices for designing, evaluating, and deploying responsible AI systems with transparent and interpretable features that build critical user understanding and trust.\n\nPhillips, P Jonathon, Carina A Hahn, Peter C Fontana, David A Broniatowski, and Mark A Przybocki. 2020. “Four Principles of Explainable Artificial Intelligence.” Gaithersburg, Maryland 18.\nIt delineates four core principles: Firstly, AI systems should provide contextually relevant explanations justifying the reasoning behind their outputs to appropriate stakeholders. Secondly, these AI explanations must communicate information meaningfully for their target audience’s appropriate comprehension level. Next is the accuracy principle, which dictates that explanations should faithfully reflect the actual process and logic informing an AI model’s internal mechanics for generating given outputs or recommendations based on inputs. Finally, a knowledge limits principle compels explanations to clarify an AI model’s boundaries in capturing the full breadth of real-world complexity, variance, and uncertainties within a problem space.\nAltogether, these NIST principles offer AI practitioners and adopters guidance on key transparency considerations vital for developing accessible solutions prioritizing user autonomy and trust rather than simply maximizing predictive accuracy metrics alone. As AI rapidly advances across sensitive social contexts like healthcare, finance, employment, and beyond, such human-centered design guidelines will continue growing in importance for anchoring innovation to public interests.\nThis applies equally to the domain of environmental ability. Responsible and democratically guided AI innovation targeting shared ecological priorities depends on maintaining public vigilance, understanding, and oversight over otherwise opaque systems taking prominent roles in societal decisions. Prioritizing explainable algorithm designs and radical transparency practices per global standards can help sustain collective confidence that these tools improve rather than imperil hopes for a driven future.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1859,7 +1859,7 @@ "section": "16.14 Future Directions and Challenges", "text": "16.14 Future Directions and Challenges\nAs we look towards the future, the role of AI in environmental sustainability is poised to grow even more significant. AI’s potential to drive advancements in renewable energy, climate modeling, conservation efforts, and more is immense. However, it is a two-sided coin, as we need to overcome several challenges and direct our efforts towards sustainable and responsible AI development.\n\n16.14.1 Future Directions\nOne key future direction is the development of more energy-efficient AI models and algorithms. This involves ongoing research and innovation in areas like model pruning, quantization, and the use of low-precision numerics, as well as developing the hardware to enable full profitability of these innovations. Even further, we look at alternative computing paradigms that do not rely on von-Neumann architectures. More on this topic can be found in the hardware acceleration chapter. The goal is to create AI systems that deliver high performance while minimizing energy consumption and carbon emissions.\nAnother important direction is the integration of renewable energy sources into AI infrastructure. As data centers continue to be major contributors to AI’s carbon footprint, transitioning to renewable energy sources like solar and wind is crucial. Developments in long-term, sustainable energy storage, such as Ambri, an MIT spinoff, could enable this transition. This requires significant investment and collaboration between tech companies, energy providers, and policymakers.\n\n\n16.14.2 Challenges\nDespite these promising directions, several challenges need to be addressed. One of the major challenges is the need for consistent standards and methodologies for measuring and reporting the environmental impact of AI. These methods must capture the complexity of the life cycles of AI models and system hardware. Next, efficient and environmentally sustainable AI infrastructure and system hardware are needed. This consists of three components. It aims to maximize the utilization of accelerator and system resources, prolong the lifetime of AI infrastructure, and design systems hardware with environmental impact in mind.\nOn the software side, we should trade off experimentation and the subsequent training cost. Techniques such as neural architecture search and hyperparameter optimization can be used for design space exploration. However, these are often very resource-intensive. Efficient experimentation can significantly reduce the environmental footprint overhead. Next, methods to reduce wasted training efforts should be explored.\nTo improve model quality, we often scale the dataset. However, the increased system resources required for data storage and ingestion caused by this scaling have a significant environmental impact (Wu et al. 2022). A thorough understanding of the rate at which data loses its predictive value and devising data sampling strategies is important.\n\nWu, Carole-Jean, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, et al. 2022. “Sustainable Ai: Environmental Implications, Challenges and Opportunities.” Proceedings of Machine Learning and Systems 4: 795–813.\nData gaps also pose a significant challenge. Without companies and governments openly sharing detailed and accurate data on energy consumption, carbon emissions, and other environmental impacts, it isn’t easy to develop effective strategies for sustainable AI.\nFinally, the fast pace of AI development requires an agile approach to the policy imposed on these systems. The policy should ensure sustainable development without constraining innovation. This requires experts in all domains of AI, environmental sciences, energy, and policy to work together to achieve a sustainable future.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1870,7 +1870,7 @@ "section": "16.15 Conclusion", "text": "16.15 Conclusion\nWe must address sustainability considerations as AI rapidly expands across industries and society. AI promises breakthrough innovations, yet its environmental footprint threatens its widespread growth. This chapter analyzes multiple facets, from energy and emissions to waste and biodiversity impacts, that AI/ML developers must weigh when creating responsible AI systems.\nFundamentally, we require elevating sustainability as a primary design priority rather than an afterthought. Techniques like energy-efficient models, renewable-powered data centers, and hardware recycling programs offer solutions, but the holistic commitment remains vital. We need standards around transparency, carbon accounting, and supply chain disclosures to supplement technical gains. Still, examples like Google’s 4M efficiency practices containing ML energy use highlight that we can advance AI in lockstep with environmental objectives with concerted effort. We achieve this harmonious balance by having researchers, corporations, regulators, and users collaborate across domains. The aim is not perfect solutions but continuous improvement as we integrate AI across new sectors.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1879,9 +1879,9 @@ "href": "contents/sustainable_ai/sustainable_ai.html#sec-sustainable-ai-resource", "title": "16  Sustainable AI", "section": "16.16 Resources", - "text": "16.16 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nTransparency and Sustainability.\nSustainability of TinyML.\nModel Cards for Transparency.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 16.1\nExercise 16.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "16.16 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nTransparency and Sustainability.\nSustainability of TinyML.\nModel Cards for Transparency.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 16.1\nExercise 16.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer hands-on labs that allow students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Advanced Topics", "16  Sustainable AI" ] }, @@ -1892,7 +1892,7 @@ "section": "", "text": "17.1 Introduction\nRobust AI refers to a system’s ability to maintain its performance and reliability in the presence of hardware, software, and errors. A robust machine learning system is designed to be fault-tolerant and error-resilient, capable of operating effectively even under adverse conditions.\nAs ML systems become increasingly integrated into various aspects of our lives, from cloud-based services to edge devices and embedded systems, the impact of hardware and software faults on their performance and reliability becomes more significant. In the future, as ML systems become more complex and are deployed in even more critical applications, the need for robust and fault-tolerant designs will be paramount.\nML systems are expected to play crucial roles in autonomous vehicles, smart cities, healthcare, and industrial automation domains. In these domains, the consequences of hardware or software faults can be severe, potentially leading to loss of life, economic damage, or environmental harm.\nResearchers and engineers must focus on developing advanced techniques for fault detection, isolation, and recovery to mitigate these risks and ensure the reliable operation of future ML systems.\nThis chapter will focus specifically on three main categories of faults and errors that can impact the robustness of ML systems: hardware faults, software faults, and human errors.\nThe specific challenges and approaches to achieving robustness may vary depending on the scale and constraints of the ML system. Large-scale cloud computing or data center systems may focus on fault tolerance and resilience through redundancy, distributed processing, and advanced error detection and correction techniques. In contrast, resource-constrained edge devices or embedded systems face unique challenges due to limited computational power, memory, and energy resources.\nRegardless of the scale and constraints, the key characteristics of a robust ML system include fault tolerance, error resilience, and performance maintenance. By understanding and addressing the multifaceted challenges to robustness, we can develop trustworthy and reliable ML systems that can navigate the complexities of real-world environments.\nThis chapter is not just about exploring ML systems’ tools, frameworks, and techniques for detecting and mitigating faults, attacks, and distributional shifts. It’s about emphasizing the crucial role of each one of you in prioritizing resilience throughout the AI development lifecycle, from data collection and model training to deployment and monitoring. By proactively addressing the challenges to robustness, we can unlock the full potential of ML technologies while ensuring their safe, reliable, and responsible deployment in real-world applications.\nAs AI continues to shape our future, the potential of ML technologies is immense. But it’s only when we build resilient systems that can withstand the challenges of the real world that we can truly harness this potential. This is a defining factor in the success and societal impact of this transformative technology, and it’s within our reach.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1903,7 +1903,7 @@ "section": "", "text": "Hardware Faults: Transient, permanent, and intermittent faults can affect the hardware components of an ML system, corrupting computations and degrading performance.\nModel Robustness: ML models can be vulnerable to adversarial attacks, data poisoning, and distribution shifts, which can induce targeted misclassifications, skew the model’s learned behavior, or compromise the system’s integrity and reliability.\nSoftware Faults: Bugs, design flaws, and implementation errors in the software components, such as algorithms, libraries, and frameworks, can propagate errors and introduce vulnerabilities.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1914,7 +1914,7 @@ "section": "17.2 Real-World Examples", "text": "17.2 Real-World Examples\nHere are some real-world examples of cases where faults in hardware or software have caused major issues in ML systems across cloud, edge, and embedded environments:\n\n17.2.1 Cloud\nIn February 2017, Amazon Web Services (AWS) experienced a significant outage due to human error during maintenance. An engineer inadvertently entered an incorrect command, causing many servers to be taken offline. This outage disrupted many AWS services, including Amazon’s AI-powered assistant, Alexa. As a result, Alexa-powered devices, such as Amazon Echo and third-party products using Alexa Voice Service, could not respond to user requests for several hours. This incident highlights the potential impact of human errors on cloud-based ML systems and the need for robust maintenance procedures and failsafe mechanisms.\nIn another example (Vangal et al. 2021), Facebook encountered a silent data corruption (SDC) issue within its distributed querying infrastructure, as shown in Figure 17.1. Facebook’s infrastructure includes a querying system that fetches and executes SQL and SQL-like queries across multiple datasets using frameworks like Presto, Hive, and Spark. One of the applications that utilized this querying infrastructure was a compression application to reduce the footprint of data stores. In this compression application, files were compressed when not being read and decompressed when a read request was made. Before decompression, the file size was checked to ensure it was greater than zero, indicating a valid compressed file with contents.\n\nVangal, Sriram, Somnath Paul, Steven Hsu, Amit Agarwal, Saurabh Kumar, Ram Krishnamurthy, Harish Krishnamurthy, James Tschanz, Vivek De, and Chris H. Kim. 2021. “Wide-Range Many-Core SoC Design in Scaled CMOS: Challenges and Opportunities.” IEEE Trans. Very Large Scale Integr. VLSI Syst. 29 (5): 843–56. https://doi.org/10.1109/tvlsi.2021.3061649.\n\n\n\n\n\n\nFigure 17.1: Silent data corruption in database applications (Credit: Facebook)\n\n\n\nHowever, in one instance, when the file size was being computed for a valid non-zero-sized file, the decompression algorithm invoked a power function from the Scala library. Unexpectedly, the Scala function returned a zero size value for the file despite having a known non-zero decompressed size. As a result, the decompression was not performed, and the file was not written to the output database. This issue manifested sporadically, with some occurrences of the same file size computation returning the correct non-zero value.\nThe impact of this silent data corruption was significant, leading to missing files and incorrect data in the output database. The application relying on the decompressed files failed due to the data inconsistencies. In the case study presented in the paper, Facebook’s infrastructure, which consists of hundreds of thousands of servers handling billions of requests per day from their massive user base, encountered a silent data corruption issue. The affected system processed user queries, image uploads, and media content, which required fast, reliable, and secure execution.\nThis case study illustrates how silent data corruption can propagate through multiple layers of an application stack, leading to data loss and application failures in a large-scale distributed system. The intermittent nature of the issue and the lack of explicit error messages made it particularly challenging to diagnose and resolve. But this is not restricted to just Meta, even other companies such as Google that operate AI hypercomputers face this challenge. Figure 17.2 Jeff Dean, Chief Scientist at Google DeepMind and Google Research, discusses SDCS and their impact on ML systems.\n\n\n\n\n\n\nFigure 17.2: Silent data corruption (SDC) errors are a major issue for AI hypercomputers. (Credit: Jeff Dean at MLSys 2024, Keynote (Google))\n\n\n\n\n\n17.2.2 Edge\nRegarding examples of faults and errors in edge ML systems, one area that has gathered significant attention is the domain of self-driving cars. Self-driving vehicles rely heavily on machine learning algorithms for perception, decision-making, and control, making them particularly susceptible to the impact of hardware and software faults. In recent years, several high-profile incidents involving autonomous vehicles have highlighted the challenges and risks associated with deploying these systems in real-world environments.\nIn May 2016, a fatal accident occurred when a Tesla Model S operating on Autopilot crashed into a white semi-trailer truck crossing the highway. The Autopilot system, which relied on computer vision and machine learning algorithms, failed to recognize the white trailer against a bright sky background. The driver, who was reportedly watching a movie when the crash, did not intervene in time, and the vehicle collided with the trailer at full speed. This incident raised concerns about the limitations of AI-based perception systems and the need for robust failsafe mechanisms in autonomous vehicles. It also highlighted the importance of driver awareness and the need for clear guidelines on using semi-autonomous driving features, as shown in Figure 17.3.\n\n\n\n\n\n\nFigure 17.3: Tesla in the fatal California crash was on Autopilot (Credit: BBC News)\n\n\n\nIn March 2018, an Uber self-driving test vehicle struck and killed a pedestrian crossing the street in Tempe, Arizona. The incident was caused by a software flaw in the vehicle’s object recognition system, which failed to identify the pedestrians appropriately to avoid them as obstacles. The safety driver, who was supposed to monitor the vehicle’s operation and intervene if necessary, was found distracted during the crash. This incident led to widespread scrutiny of Uber’s self-driving program and raised questions about the readiness of autonomous vehicle technology for public roads. It also emphasized the need for rigorous testing, validation, and safety measures in developing and deploying AI-based self-driving systems.\nIn 2021, Tesla faced increased scrutiny following several accidents involving vehicles operating on Autopilot mode. Some of these accidents were attributed to issues with the Autopilot system’s ability to detect and respond to certain road situations, such as stationary emergency vehicles or obstacles in the road. For example, in April 2021, a Tesla Model S crashed into a tree in Texas, killing two passengers. Initial reports suggested that no one was in the driver’s seat at the time of the crash, raising questions about the use and potential misuse of Autopilot features. These incidents highlight the ongoing challenges in developing robust and reliable autonomous driving systems and the need for clear regulations and consumer education regarding the capabilities and limitations of these technologies.\n\n\n17.2.3 Embedded\nEmbedded systems, which often operate in resource-constrained environments and safety-critical applications, have long faced challenges related to hardware and software faults. As AI and machine learning technologies are increasingly integrated into these systems, the potential for faults and errors takes on new dimensions, with the added complexity of AI algorithms and the critical nature of the applications in which they are deployed.\nLet’s consider a few examples, starting with outer space exploration. NASA’s Mars Polar Lander mission in 1999 suffered a catastrophic failure due to a software error in the touchdown detection system (Figure 17.4). The spacecraft’s onboard software mistakenly interpreted the noise from the deployment of its landing legs as a sign that it had touched down on the Martian surface. As a result, the spacecraft prematurely shut down its engines, causing it to crash into the surface. This incident highlights the critical importance of robust software design and extensive testing in embedded systems, especially those operating in remote and unforgiving environments. As AI capabilities are integrated into future space missions, ensuring these systems’ reliability and fault tolerance will be paramount to mission success.\n\n\n\n\n\n\nFigure 17.4: NASA’s Failed Mars Polar Lander mission in 1999 cost over $200M (Credit: SlashGear)\n\n\n\nBack on earth, in 2015, a Boeing 787 Dreamliner experienced a complete electrical shutdown during a flight due to a software bug in its generator control units. The bug caused the generator control units to enter a failsafe mode, cutting power to the aircraft’s electrical systems and forcing an emergency landing. This incident underscores the potential for software faults to have severe consequences in complex embedded systems like aircraft. As AI technologies are increasingly applied in aviation, such as in autonomous flight systems and predictive maintenance, ensuring the robustness and reliability of these systems will be critical to passenger safety.\nAs AI capabilities increasingly integrate into embedded systems, the potential for faults and errors becomes more complex and severe. Imagine a smart pacemaker that has a sudden glitch. A patient could die from that effect. Therefore, AI algorithms, such as those used for perception, decision-making, and control, introduce new sources of potential faults, such as data-related issues, model uncertainties, and unexpected behaviors in edge cases. Moreover, the opaque nature of some AI models can make it challenging to identify and diagnose faults when they occur.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1923,9 +1923,9 @@ "href": "contents/robust_ai/robust_ai.html#hardware-faults", "title": "17  Robust AI", "section": "17.3 Hardware Faults", - "text": "17.3 Hardware Faults\nHardware faults are a significant challenge in computing systems, including traditional and ML systems. These faults occur when physical components, such as processors, memory modules, storage devices, or interconnects, malfunction or behave abnormally. Hardware faults can cause incorrect computations, data corruption, system crashes, or complete system failure, compromising the integrity and trustworthiness of the computations performed by the system (Jha et al. 2019). A complete system failure refers to a situation where the entire computing system becomes unresponsive or inoperable due to a critical hardware malfunction. This type of failure is the most severe, as it renders the system unusable and may lead to data loss or corruption, requiring manual intervention to repair or replace the faulty components.\nUnderstanding the taxonomy of hardware faults is essential for anyone working with computing systems, especially in the context of ML systems. ML systems rely on complex hardware architectures and large-scale computations to train and deploy models that learn from data and make intelligent predictions or decisions. However, hardware faults can introduce errors and inconsistencies in the MLOps pipeline, affecting the trained models’ accuracy, robustness, and reliability (G. Li et al. 2017).\nKnowing the different types of hardware faults, their mechanisms, and their potential impact on system behavior is crucial for developing effective strategies to detect, mitigate, and recover them. This knowledge is necessary for designing fault-tolerant computing systems, implementing robust ML algorithms, and ensuring the overall dependability of ML-based applications.\nThe following sections will explore the three main categories of hardware faults: transient, permanent, and intermittent. We will discuss their definitions, characteristics, causes, mechanisms, and examples of how they manifest in computing systems. We will also cover detection and mitigation techniques specific to each fault type.\n\nTransient Faults: Transient faults are temporary and non-recurring. They are often caused by external factors such as cosmic rays, electromagnetic interference, or power fluctuations. A common example of a transient fault is a bit flip, where a single bit in a memory location or register changes its value unexpectedly. Transient faults can lead to incorrect computations or data corruption, but they do not cause permanent damage to the hardware.\nPermanent Faults: Permanent faults, also called hard errors, are irreversible and persist over time. They are typically caused by physical defects or wear-out of hardware components. Examples of permanent faults include stuck-at faults, where a bit or signal is permanently set to a specific value (e.g., always 0 or always 1), and device failures, such as a malfunctioning processor or a damaged memory module. Permanent faults can result in complete system failure or significant performance degradation.\nIntermittent Faults: Intermittent faults are recurring faults that appear and disappear intermittently. Unstable hardware conditions, such as loose connections, aging components, or manufacturing defects, often cause them. Intermittent faults can be challenging to diagnose and reproduce because they may occur sporadically and under specific conditions. Examples include intermittent short circuits or contact resistance issues. Intermittent faults can lead to unpredictable system behavior and intermittent errors.\n\nBy the end of this discussion, readers will have a solid understanding of fault taxonomy and its relevance to traditional computing and ML systems. This foundation will help them make informed decisions when designing, implementing, and deploying fault-tolerant solutions, improving the reliability and trustworthiness of their computing systems and ML applications.\n\n17.3.1 Transient Faults\nTransient faults in hardware can manifest in various forms, each with its own unique characteristics and causes. These faults are temporary in nature and do not result in permanent damage to the hardware components.\n\nDefinition and Characteristics\nSome of the common types of transient faults include Single Event Upsets (SEUs) caused by ionizing radiation, voltage fluctuations (Reddi and Gupta 2013) due to power supply noise or electromagnetic interference, Electromagnetic Interference (EMI) induced by external electromagnetic fields, Electrostatic Discharge (ESD) resulting from sudden static electricity flow, crosstalk caused by unintended signal coupling, ground bounce triggered by simultaneous switching of multiple outputs, timing violations due to signal timing constraint breaches, and soft errors in combinational logic affecting the output of logic circuits (Mukherjee, Emer, and Reinhardt 2005). Understanding these different types of transient faults is crucial for designing robust and resilient hardware systems that can mitigate their impact and ensure reliable operation.\n\nReddi, Vijay Janapa, and Meeta Sharma Gupta. 2013. Resilient Architecture Design for Voltage Variation. Springer International Publishing. https://doi.org/10.1007/978-3-031-01739-1.\n\nMukherjee, S. S., J. Emer, and S. K. Reinhardt. 2005. “The Soft Error Problem: An Architectural Perspective.” In 11th International Symposium on High-Performance Computer Architecture, 243–47. IEEE; IEEE. https://doi.org/10.1109/hpca.2005.37.\nAll of these transient faults are characterized by their short duration and non-permanent nature. They do not persist or leave any lasting impact on the hardware. However, they can still lead to incorrect computations, data corruption, or system misbehavior if not properly handled.\n\n\n\nCauses of Transient Faults\nTransient faults can be attributed to various external factors. One common cause is cosmic rays, high-energy particles originating from outer space. When these particles strike sensitive areas of the hardware, such as memory cells or transistors, they can induce charge disturbances that alter the stored or transmitted data. This is illustrated in Figure 17.5. Another cause of transient faults is electromagnetic interference (EMI) from nearby devices or power fluctuations. EMI can couple with the circuits and cause voltage spikes or glitches that temporarily disrupt the normal operation of the hardware.\n\n\n\n\n\n\nFigure 17.5: Mechanism of Hardware Transient Fault Occurrence (Credit: NTT)\n\n\n\n\n\nMechanisms of Transient Faults\nTransient faults can manifest through different mechanisms depending on the affected hardware component. In memory devices like DRAM or SRAM, transient faults often lead to bit flips, where a single bit changes its value from 0 to 1 or vice versa. This can corrupt the stored data or instructions. In logic circuits, transient faults can cause glitches or voltage spikes propagating through the combinational logic, resulting in incorrect outputs or control signals. Transient faults can also affect communication channels, causing bit errors or packet losses during data transmission.\n\n\nImpact on ML Systems\nA common example of a transient fault is a bit flip in the main memory. If an important data structure or critical instruction is stored in the affected memory location, it can lead to incorrect computations or program misbehavior. If a transient fault occurs in the memory storing the model weights or gradients. For instance, a bit flip in the memory storing a loop counter can cause the loop to execute indefinitely or terminate prematurely. Transient faults in control registers or flag bits can alter the flow of program execution, leading to unexpected jumps or incorrect branch decisions. In communication systems, transient faults can corrupt transmitted data packets, resulting in retransmissions or data loss.\nIn ML systems, transient faults can have significant implications during the training phase (He et al. 2023). ML training involves iterative computations and updates to model parameters based on large datasets. If a transient fault occurs in the memory storing the model weights or gradients, it can lead to incorrect updates and compromise the convergence and accuracy of the training process. Figure 17.6 Show a real-world example from Google’s production fleet where an SDC anomaly caused a significant difference in the gradient norm.\n\n\n\n\n\n\nFigure 17.6: SDC in ML training phase results in anomalies in the gradient norm. (Credit: Jeff Dean, MLSys 2024 Keynote (Google))\n\n\n\nFor example, a bit flip in the weight matrix of a neural network can cause the model to learn incorrect patterns or associations, leading to degraded performance (Wan et al. 2021). Transient faults in the data pipeline, such as corruption of training samples or labels, can also introduce noise and affect the quality of the learned model.\n\nWan, Zishen, Aqeel Anwar, Yu-Shun Hsiao, Tianyu Jia, Vijay Janapa Reddi, and Arijit Raychowdhury. 2021. “Analyzing and Improving Fault Tolerance of Learning-Based Navigation Systems.” In 2021 58th ACM/IEEE Design Automation Conference (DAC), 841–46. IEEE; IEEE. https://doi.org/10.1109/dac18074.2021.9586116.\nDuring the inference phase, transient faults can impact the reliability and trustworthiness of ML predictions. If a transient fault occurs in the memory storing the trained model parameters or in the computation of the inference results, it can lead to incorrect or inconsistent predictions. For instance, a bit flip in the activation values of a neural network can alter the final classification or regression output (Mahmoud et al. 2020).\nIn safety-critical applications, such as autonomous vehicles or medical diagnosis, transient faults during inference can have severe consequences, leading to incorrect decisions or actions (G. Li et al. 2017; Jha et al. 2019). Ensuring the resilience of ML systems against transient faults is crucial to maintaining the integrity and reliability of the predictions.\n\nLi, Guanpeng, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017. “Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–12. ACM. https://doi.org/10.1145/3126908.3126964.\n\n\n\n17.3.2 Permanent Faults\nPermanent faults are hardware defects that persist and cause irreversible damage to the affected components. These faults are characterized by their persistent nature and require repair or replacement of the faulty hardware to restore normal system functionality.\n\nDefinition and Characteristics\nPermanent faults are hardware defects that cause persistent and irreversible malfunctions in the affected components. The faulty component remains non-operational until a permanent fault is repaired or replaced. These faults are characterized by their consistent and reproducible nature, meaning that the faulty behavior is observed every time the affected component is used. Permanent faults can impact various hardware components, such as processors, memory modules, storage devices, or interconnects, leading to system crashes, data corruption, or complete system failure.\nOne notable example of a permanent fault is the Intel FDIV bug, which was discovered in 1994. The FDIV bug was a flaw in certain Intel Pentium processors’ floating-point division (FDIV) units. The bug caused incorrect results for specific division operations, leading to inaccurate calculations.\nThe FDIV bug occurred due to an error in the lookup table used by the division unit. In rare cases, the processor would fetch an incorrect value from the lookup table, resulting in a slightly less precise result than expected. For instance, Figure 17.7 shows a fraction 4195835/3145727 plotted on a Pentium processor with the FDIV permanent fault. The triangular regions are where erroneous calculations occurred. Ideally, all correct values would round to 1.3338, but the erroneous results show 1.3337, indicating a mistake in the 5th digit.\nAlthough the error was small, it could compound over many division operations, leading to significant inaccuracies in mathematical calculations. The impact of the FDIV bug was significant, especially for applications that relied heavily on precise floating-point division, such as scientific simulations, financial calculations, and computer-aided design. The bug led to incorrect results, which could have severe consequences in fields like finance or engineering.\n\n\n\n\n\n\nFigure 17.7: Intel Pentium processor with the FDIV permanent fault. The triangular regions are where erroneous calculations occurred. (Credit: Byte Magazine)\n\n\n\nThe Intel FDIV bug is a cautionary tale for the potential impact of permanent faults on ML systems. In the context of ML, permanent faults in hardware components can lead to incorrect computations, affecting the accuracy and reliability of the models. For example, if an ML system relies on a processor with a faulty floating-point unit, similar to the Intel FDIV bug, it could introduce errors in the calculations performed during training or inference.\nThese errors can propagate through the model, leading to inaccurate predictions or skewed learning. In applications where ML is used for critical tasks, such as autonomous driving, medical diagnosis, or financial forecasting, the consequences of incorrect computations due to permanent faults can be severe.\nIt is crucial for ML practitioners to be aware of the potential impact of permanent faults and to incorporate fault-tolerant techniques, such as hardware redundancy, error detection and correction mechanisms, and robust algorithm design, to mitigate the risks associated with these faults. Additionally, thorough testing and validation of ML hardware components can help identify and address permanent faults before they impact the system’s performance and reliability.\n\n\nCauses of Permanent Faults\nPermanent faults can arise from several causes, including manufacturing defects and wear-out mechanisms. Manufacturing defects are inherent flaws introduced during the fabrication process of hardware components. These defects include improper etching, incorrect doping, or contamination, leading to non-functional or partially functional components.\nOn the other hand, wear-out mechanisms occur over time as the hardware components are subjected to prolonged use and stress. Factors such as electromigration, oxide breakdown, or thermal stress can cause gradual degradation of the components, eventually leading to permanent failures.\n\n\nMechanisms of Permanent Faults\nPermanent faults can manifest through various mechanisms, depending on the nature and location of the fault. Stuck-at faults (Seong et al. 2010) are common permanent faults where a signal or memory cell remains fixed at a particular value (either 0 or 1) regardless of the inputs, as illustrated in Figure 17.8.\n\nSeong, Nak Hee, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers, and Hsien-Hsin S. Lee. 2010. “SAFER: Stuck-at-fault Error Recovery for Memories.” In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 115–24. IEEE; IEEE. https://doi.org/10.1109/micro.2010.46.\n\n\n\n\n\n\nFigure 17.8: Stuck-at Fault Model in Digital Circuits (Credit: Accendo Reliability)\n\n\n\nStuck-at faults can occur in logic gates, memory cells, or interconnects, causing incorrect computations or data corruption. Another mechanism is device failures, where a component, such as a transistor or a memory cell, completely ceases to function. This can be due to manufacturing defects or severe wear-out. Bridging faults occur when two or more signal lines are unintentionally connected, causing short circuits or incorrect logic behavior.\nIn addition to stuck-at faults, there are several other types of permanent faults that can affect digital circuits that can impact an ML system. Delay faults can cause the propagation delay of a signal to exceed the specified limit, leading to timing violations. Interconnect faults, such as open faults (broken wires), resistive faults (increased resistance), or capacitive faults (increased capacitance), can cause signal integrity issues or timing violations. Memory cells can also suffer from various faults, including transition faults (inability to change state), coupling faults (interference between adjacent cells), and neighborhood pattern sensitive faults (faults that depend on the values of neighboring cells). Other permanent faults can occur in the power supply network or the clock distribution network, affecting the functionality and timing of the circuit.\n\n\nImpact on ML Systems\nPermanent faults can severely affect the behavior and reliability of computing systems. For example, a stuck-at-fault in a processor’s arithmetic logic unit (ALU) can cause incorrect computations, leading to erroneous results or system crashes. A permanent fault in a memory module, such as a stuck-at fault in a specific memory cell, can corrupt the stored data, causing data loss or program misbehavior. In storage devices, permanent faults like bad sectors or device failures can result in data inaccessibility or complete loss of stored information. Permanent interconnect faults can disrupt communication channels, causing data corruption or system hangs.\nPermanent faults can significantly affect ML systems during the training and inference phases. During training, permanent faults in processing units or memory can lead to incorrect computations, resulting in corrupted or suboptimal models (He et al. 2023). Furthermore, faults in storage devices can corrupt the training data or the stored model parameters, leading to data loss or model inconsistencies (He et al. 2023).\n\nZhang, Jeff Jun, Tianyu Gu, Kanad Basu, and Siddharth Garg. 2018. “Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator.” In 2018 IEEE 36th VLSI Test Symposium (VTS), 1–6. IEEE; IEEE. https://doi.org/10.1109/vts.2018.8368656.\nDuring inference, permanent faults can impact the reliability and correctness of ML predictions. Faults in the processing units can produce incorrect results or cause system failures, while faults in memory storing the model parameters can lead to corrupted or outdated models being used for inference (J. J. Zhang et al. 2018).\nTo mitigate the impact of permanent faults in ML systems, fault-tolerant techniques must be employed at both the hardware and software levels. Hardware redundancy, such as duplicating critical components or using error-correcting codes (Kim, Sullivan, and Erez 2015), can help detect and recover from permanent faults. Software techniques, such as checkpoint and restart mechanisms (Egwutuoha et al. 2013), can enable the system to recover from permanent faults by returning to a previously saved state. Regular monitoring, testing, and maintenance of ML systems can help identify and replace faulty components before they cause significant disruptions.\n\nKim, Jungrae, Michael Sullivan, and Mattan Erez. 2015. “Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory.” In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 101–12. IEEE; IEEE. https://doi.org/10.1109/hpca.2015.7056025.\n\nEgwutuoha, Ifeanyi P., David Levy, Bran Selic, and Shiping Chen. 2013. “A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems.” The Journal of Supercomputing 65 (3): 1302–26. https://doi.org/10.1007/s11227-013-0884-0.\nDesigning ML systems with fault tolerance in mind is crucial to ensure their reliability and robustness in the presence of permanent faults. This may involve incorporating redundancy, error detection and correction mechanisms, and fail-safe strategies into the system architecture. By proactively addressing the challenges posed by permanent faults, ML systems can maintain their integrity, accuracy, and trustworthiness, even in the face of hardware failures.\n\n\n\n17.3.3 Intermittent Faults\nIntermittent faults are hardware faults that occur sporadically and unpredictably in a system. An example is illustrated in Figure 17.9, where cracks in the material can introduce increased resistance in circuitry. These faults are particularly challenging to detect and diagnose because they appear and disappear intermittently, making it difficult to reproduce and isolate the root cause. Intermittent faults can lead to system instability, data corruption, and performance degradation.\n\n\n\n\n\n\nFigure 17.9: Increased resistance due to an intermittent fault – crack between copper bump and package solder (Credit: Constantinescu)\n\n\n\n\nDefinition and Characteristics\nIntermittent faults are characterized by their sporadic and non-deterministic nature. They occur irregularly and may appear and disappear spontaneously, with varying durations and frequencies. These faults do not consistently manifest every time the affected component is used, making them harder to detect than permanent faults. Intermittent faults can affect various hardware components, including processors, memory modules, storage devices, or interconnects. They can cause transient errors, data corruption, or unexpected system behavior.\nIntermittent faults can significantly impact the behavior and reliability of computing systems (Rashid, Pattabiraman, and Gopalakrishnan 2015). For example, an intermittent fault in a processor’s control logic can cause irregular program flow, leading to incorrect computations or system hangs. Intermittent faults in memory modules can corrupt data values, resulting in erroneous program execution or data inconsistencies. In storage devices, intermittent faults can cause read/write errors or data loss. Intermittent faults in communication channels can lead to data corruption, packet loss, or intermittent connectivity issues. These faults can cause system crashes, data integrity problems, or performance degradation, depending on the severity and frequency of the intermittent failures.\n\n———. 2015. “Characterizing the Impact of Intermittent Hardware Faults on Programs.” IEEE Trans. Reliab. 64 (1): 297–310. https://doi.org/10.1109/tr.2014.2363152.\n\n\nCauses of Intermittent Faults\nIntermittent faults can arise from several causes, both internal and external, to the hardware components (Constantinescu 2008). One common cause is aging and wear-out of the components. As electronic devices age, they become more susceptible to intermittent failures due to degradation mechanisms such as electromigration, oxide breakdown, or solder joint fatigue.\n\nConstantinescu, Cristian. 2008. “Intermittent Faults and Effects on Reliability of Integrated Circuits.” In 2008 Annual Reliability and Maintainability Symposium, 370–74. IEEE; IEEE. https://doi.org/10.1109/rams.2008.4925824.\nManufacturing defects or process variations can also introduce intermittent faults, where marginal or borderline components may exhibit sporadic failures under specific conditions, as shown in Figure 17.10.\nEnvironmental factors, such as temperature fluctuations, humidity, or vibrations, can trigger intermittent faults by altering the electrical characteristics of the components. Loose or degraded connections, such as those in connectors or printed circuit boards, can cause intermittent faults.\n\n\n\n\n\n\nFigure 17.10: Residue induced intermittent fault in a DRAM chip (Credit: Hynix Semiconductor)\n\n\n\n\n\nMechanisms of Intermittent Faults\nIntermittent faults can manifest through various mechanisms, depending on the underlying cause and the affected component. One mechanism is the intermittent open or short circuit, where a signal path or connection becomes temporarily disrupted or shorted, causing erratic behavior. Another mechanism is the intermittent delay fault (J. Zhang et al. 2018), where the timing of signals or propagation delays becomes inconsistent, leading to synchronization issues or incorrect computations. Intermittent faults can manifest as transient bit flips or soft errors in memory cells or registers, causing data corruption or incorrect program execution.\n\nZhang, Jeff, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018. “ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators.” In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465918.\n\n\nImpact on ML Systems\nIn the context of ML systems, intermittent faults can introduce significant challenges and impact the system’s reliability and performance. During the training phase, intermittent faults in processing units or memory can lead to inconsistencies in computations, resulting in incorrect or noisy gradients and weight updates. This can affect the convergence and accuracy of the training process, leading to suboptimal or unstable models. Intermittent data storage or retrieval faults can corrupt the training data, introducing noise or errors that degrade the quality of the learned models (He et al. 2023).\n\nHe, Yi, Mike Hutton, Steven Chan, Robert De Gruijl, Rama Govindaraju, Nishant Patil, and Yanjing Li. 2023. “Understanding and Mitigating Hardware Failures in Deep Learning Training Systems.” In Proceedings of the 50th Annual International Symposium on Computer Architecture, 1–16. IEEE; ACM. https://doi.org/10.1145/3579371.3589105.\nDuring the inference phase, intermittent faults can impact the reliability and consistency of ML predictions. Faults in the processing units or memory can cause incorrect computations or data corruption, leading to erroneous or inconsistent predictions. Intermittent faults in the data pipeline can introduce noise or errors in the input data, affecting the accuracy and robustness of the predictions. In safety-critical applications, such as autonomous vehicles or medical diagnosis systems, intermittent faults can have severe consequences, leading to incorrect decisions or actions that compromise safety and reliability.\nMitigating the impact of intermittent faults in ML systems requires a multifaceted approach (Rashid, Pattabiraman, and Gopalakrishnan 2012). At the hardware level, techniques such as robust design practices, component selection, and environmental control can help reduce the occurrence of intermittent faults. Redundancy and error correction mechanisms can be employed to detect and recover from intermittent failures. At the software level, runtime monitoring, anomaly detection, and fault-tolerant techniques can be incorporated into the ML pipeline. This may include techniques such as data validation, outlier detection, model ensembling, or runtime model adaptation to handle intermittent faults gracefully.\n\nRashid, Layali, Karthik Pattabiraman, and Sathish Gopalakrishnan. 2012. “Intermittent Hardware Errors Recovery: Modeling and Evaluation.” In 2012 Ninth International Conference on Quantitative Evaluation of Systems, 220–29. IEEE; IEEE. https://doi.org/10.1109/qest.2012.37.\nDesigning ML systems resilient to intermittent faults is crucial to ensuring their reliability and robustness. This involves incorporating fault-tolerant techniques, runtime monitoring, and adaptive mechanisms into the system architecture. By proactively addressing the challenges of intermittent faults, ML systems can maintain their accuracy, consistency, and trustworthiness, even in sporadic hardware failures. Regular testing, monitoring, and maintenance of ML systems can help identify and mitigate intermittent faults before they cause significant disruptions or performance degradation.\n\n\n\n17.3.4 Detection and Mitigation\nThis section explores various fault detection techniques, including hardware-level and software-level approaches, and discusses effective mitigation strategies to enhance the resilience of ML systems. Additionally, we will look into resilient ML system design considerations, present case studies and examples, and highlight future research directions in fault-tolerant ML systems.\n\nFault Detection Techniques\nFault detection techniques are important for identifying and localizing hardware faults in ML systems. These techniques can be broadly categorized into hardware-level and software-level approaches, each offering unique capabilities and advantages.\n\nHardware-level fault detection\nHardware-level fault detection techniques are implemented at the physical level of the system and aim to identify faults in the underlying hardware components. There are several hardware techniques, but broadly, we can bucket these different mechanisms into the following categories.\nBuilt-in self-test (BIST) mechanisms: BIST is a powerful technique for detecting faults in hardware components (Bushnell and Agrawal 2002). It involves incorporating additional hardware circuitry into the system for self-testing and fault detection. BIST can be applied to various components, such as processors, memory modules, or application-specific integrated circuits (ASICs). For example, BIST can be implemented in a processor using scan chains, which are dedicated paths that allow access to internal registers and logic for testing purposes.\n\nBushnell, Michael L, and Vishwani D Agrawal. 2002. “Built-in Self-Test.” Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, 489–548.\nDuring the BIST process, predefined test patterns are applied to the processor’s internal circuitry, and the responses are compared against expected values. Any discrepancies indicate the presence of faults. Intel’s Xeon processors, for instance, include BIST mechanisms to test the CPU cores, cache memory, and other critical components during system startup.\nError detection codes: Error detection codes are widely used to detect data storage and transmission errors (Hamming 1950). These codes add redundant bits to the original data, allowing the detection of bit errors. Example: Parity checks are a simple form of error detection code shown in Figure 17.11. In a single-bit parity scheme, an extra bit is appended to each data word, making the number of 1s in the word even (even parity) or odd (odd parity).\n\nHamming, R. W. 1950. “Error Detecting and Error Correcting Codes.” Bell Syst. Tech. J. 29 (2): 147–60. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.\n\n\n\n\n\n\nFigure 17.11: Parity bit example (Credit: Computer Hope)\n\n\n\nWhen reading the data, the parity is checked, and if it doesn’t match the expected value, an error is detected. More advanced error detection codes, such as cyclic redundancy checks (CRC), calculate a checksum based on the data and append it to the message. The checksum is recalculated at the receiving end and compared with the transmitted checksum to detect errors. Error-correcting code (ECC) memory modules, commonly used in servers and critical systems, employ advanced error detection and correction codes to detect and correct single-bit or multi-bit errors in memory.\nHardware redundancy and voting mechanisms: Hardware redundancy involves duplicating critical components and comparing their outputs to detect and mask faults (Sheaffer, Luebke, and Skadron 2007). Voting mechanisms, such as triple modular redundancy (TMR), employ multiple instances of a component and compare their outputs to identify and mask faulty behavior (Arifeen, Hassan, and Lee 2020).\n\nSheaffer, Jeremy W, David P Luebke, and Kevin Skadron. 2007. “A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors.” In Graphics Hardware, 2007:55–64. Citeseer.\n\nArifeen, Tooba, Abdus Sami Hassan, and Jeong-A Lee. 2020. “Approximate Triple Modular Redundancy: A Survey.” #IEEE_O_ACC# 8: 139851–67. https://doi.org/10.1109/access.2020.3012673.\n\nYeh, Y. C. 1996. “Triple-Triple Redundant 777 Primary Flight Computer.” In 1996 IEEE Aerospace Applications Conference. Proceedings, 1:293–307. IEEE; IEEE. https://doi.org/10.1109/aero.1996.495891.\nIn a TMR system, three identical instances of a hardware component, such as a processor or a sensor, perform the same computation in parallel. The outputs of these instances are fed into a voting circuit, which compares the results and selects the majority value as the final output. If one of the instances produces an incorrect result due to a fault, the voting mechanism masks the error and maintains the correct output. TMR is commonly used in aerospace and aviation systems, where high reliability is critical. For instance, the Boeing 777 aircraft employs TMR in its primary flight computer system to ensure the availability and correctness of flight control functions (Yeh 1996).\nTesla’s self-driving computers employ a redundant hardware architecture to ensure the safety and reliability of critical functions, such as perception, decision-making, and vehicle control, as shown in Figure 17.12. One key component of this architecture is using dual modular redundancy (DMR) in the car’s onboard computer systems.\n\n\n\n\n\n\nFigure 17.12: Tesla full self-driving computer with dual redundant SoCs (Credit: Tesla)\n\n\n\nIn Tesla’s DMR implementation, two identical hardware units, often called “redundant computers” or “redundant control units,” perform the same computations in parallel (Bannon et al. 2019). Each unit independently processes sensor data, executes perception and decision-making algorithms, and generates control commands for the vehicle’s actuators (e.g., steering, acceleration, and braking).\n\nBannon, Pete, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes. 2019. “Computer and Redundancy Solution for the Full Self-Driving Computer.” In 2019 IEEE Hot Chips 31 Symposium (HCS), 1–22. IEEE Computer Society; IEEE. https://doi.org/10.1109/hotchips.2019.8875645.\nThe outputs of these two redundant units are continuously compared to detect any discrepancies or faults. If the outputs match, the system assumes that both units function correctly, and the control commands are sent to the vehicle’s actuators. However, if there is a mismatch between the outputs, the system identifies a potential fault in one of the units and takes appropriate action to ensure safe operation.\nThe system may employ additional mechanisms to determine which unit is faulty in a mismatch. This can involve using diagnostic algorithms, comparing the outputs with data from other sensors or subsystems, or analyzing the consistency of the outputs over time. Once the faulty unit is identified, the system can isolate it and continue operating using the output from the non-faulty unit.\nDMR in Tesla’s self-driving computer provides an extra safety and fault tolerance layer. By having two independent units performing the same computations, the system can detect and mitigate faults that may occur in one of the units. This redundancy helps prevent single points of failure and ensures that critical functions remain operational despite hardware faults.\nFurthermore, Tesla also incorporates additional redundancy mechanisms beyond DMR. For example, they utilize redundant power supplies, steering and braking systems, and diverse sensor suites (e.g., cameras, radar, and ultrasonic sensors) to provide multiple layers of fault tolerance. These redundancies collectively contribute to the overall safety and reliability of the self-driving system.\nIt’s important to note that while DMR provides fault detection and some level of fault tolerance, TMR may provide a different level of fault masking. In DMR, if both units experience simultaneous faults or the fault affects the comparison mechanism, the system may be unable to identify the fault. Therefore, Tesla’s SDCs rely on a combination of DMR and other redundancy mechanisms to achieve a high level of fault tolerance.\nThe use of DMR in Tesla’s self-driving computer highlights the importance of hardware redundancy in safety-critical applications. By employing redundant computing units and comparing their outputs, the system can detect and mitigate faults, enhancing the overall safety and reliability of the self-driving functionality.\nGoogle employs redundant hot spares to deal with SDC issues within its data centers, thereby enhancing the reliability of critical functions. As illustrated in Figure 17.13, during the normal training phase, multiple synchronous training workers function flawlessly. However, if a worker becomes defective and causes SDC, an SDC checker automatically identifies the issues. Upon detecting the SDC, the SDC checker moves the training to a hot spare and sends the defective machine for repair. This redundancy safeguards the continuity and reliability of ML training, effectively minimizing downtime and preserving data integrity.\n\n\n\n\n\n\nFigure 17.13: Google employs hot spare cores to transparently handle SDCs in the data center. (Credit: Jeff Dean, MLSys 2024 Keynote (Google))\n\n\n\nWatchdog timers: Watchdog timers are hardware components that monitor the execution of critical tasks or processes (Pont and Ong 2002). They are commonly used to detect and recover from software or hardware faults that cause a system to become unresponsive or stuck in an infinite loop. In an embedded system, a watchdog timer can be configured to monitor the execution of the main control loop, as illustrated in Figure 17.14. The software periodically resets the watchdog timer to indicate that it functions correctly. Suppose the software fails to reset the timer within a specified time limit (timeout period). In that case, the watchdog timer assumes that the system has encountered a fault and triggers a predefined recovery action, such as resetting the system or switching to a backup component. Watchdog timers are widely used in automotive electronics, industrial control systems, and other safety-critical applications to ensure the timely detection and recovery from faults.\n\nPont, Michael J, and Royan HL Ong. 2002. “Using Watchdog Timers to Improve the Reliability of Single-Processor Embedded Systems: Seven New Patterns and a Case Study.” In Proceedings of the First Nordic Conference on Pattern Languages of Programs, 159–200. Citeseer.\n\n\n\n\n\n\nFigure 17.14: Watchdog timer example in detecting MCU faults (Credit: Ablic)\n\n\n\n\n\nSoftware-level fault detection\nSoftware-level fault detection techniques rely on software algorithms and monitoring mechanisms to identify system faults. These techniques can be implemented at various levels of the software stack, including the operating system, middleware, or application level.\nRuntime monitoring and anomaly detection: Runtime monitoring involves continuously observing the behavior of the system and its components during execution (Francalanza et al. 2017). It helps detect anomalies, errors, or unexpected behavior that may indicate the presence of faults. For example, consider an ML-based image classification system deployed in a self-driving car. Runtime monitoring can be implemented to track the classification model’s performance and behavior (Mahmoud et al. 2021).\n\nFrancalanza, Adrian, Luca Aceto, Antonis Achilleos, Duncan Paul Attard, Ian Cassar, Dario Della Monica, and Anna Ingólfsdóttir. 2017. “A Foundation for Runtime Monitoring.” In International Conference on Runtime Verification, 8–29. Springer.\n\nMahmoud, Abdulrahman, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, and Stephen W. Keckler. 2021. “Optimizing Selective Protection for CNN Resilience.” In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 127–38. IEEE. https://doi.org/10.1109/issre52982.2021.00025.\n\nChandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly Detection: A Survey.” ACM Comput. Surv. 41 (3): 1–58. https://doi.org/10.1145/1541880.1541882.\nAnomaly detection algorithms can be applied to the model’s predictions or intermediate layer activations, such as statistical outlier detection or machine learning-based approaches (e.g., One-Class SVM or Autoencoders) (Chandola, Banerjee, and Kumar 2009). Figure 17.15 shows example of anomaly detection. Suppose the monitoring system detects a significant deviation from the expected patterns, such as a sudden drop in classification accuracy or out-of-distribution samples. In that case, it can raise an alert indicating a potential fault in the model or the input data pipeline. This early detection allows for timely intervention and fault mitigation strategies to be applied.\n\n\n\n\n\n\nFigure 17.15: Examples of anomaly detection. (a) Fully supervised anomaly detection, (b) normal-only anomaly detection, (c, d, e) semi-supervised anomaly detection, (f) unsupervised anomaly detection (Credit: Google)\n\n\n\nConsistency checks and data validation: Consistency checks and data validation techniques ensure data integrity and correctness at different processing stages in an ML system (Lindholm et al. 2019). These checks help detect data corruption, inconsistencies, or errors that may propagate and affect the system’s behavior. Example: In a distributed ML system where multiple nodes collaborate to train a model, consistency checks can be implemented to validate the integrity of the shared model parameters. Each node can compute a checksum or hash of the model parameters before and after the training iteration, as shown in Figure 17.15. Any inconsistencies or data corruption can be detected by comparing the checksums across nodes. Additionally, range checks can be applied to the input data and model outputs to ensure they fall within expected bounds. For instance, if an autonomous vehicle’s perception system detects an object with unrealistic dimensions or velocities, it can indicate a fault in the sensor data or the perception algorithms (Wan et al. 2023).\n\nLindholm, Andreas, Dave Zachariah, Petre Stoica, and Thomas B. Schon. 2019. “Data Consistency Approach to Model Validation.” #IEEE_O_ACC# 7: 59788–96. https://doi.org/10.1109/access.2019.2915109.\n\nWan, Zishen, Yiming Gan, Bo Yu, S Liu, A Raychowdhury, and Y Zhu. 2023. “Vpp: The Vulnerability-Proportional Protection Paradigm Towards Reliable Autonomous Machines.” In Proceedings of the 5th International Workshop on Domain Specific System Architecture (DOSSA), 1–6.\n\nKawazoe Aguilera, Marcos, Wei Chen, and Sam Toueg. 1997. “Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication.” In Distributed Algorithms: 11th International Workshop, WDAG’97 Saarbrücken, Germany, September 2426, 1997 Proceedings 11, 126–40. Springer.\nHeartbeat and timeout mechanisms: Heartbeat mechanisms and timeouts are commonly used to detect faults in distributed systems and ensure the liveness and responsiveness of components (Kawazoe Aguilera, Chen, and Toueg 1997). These are quite similar to the watchdog timers found in hardware. For example, in a distributed ML system, where multiple nodes collaborate to perform tasks such as data preprocessing, model training, or inference, heartbeat mechanisms can be implemented to monitor the health and availability of each node. Each node periodically sends a heartbeat message to a central coordinator or its peer nodes, indicating its status and availability. Suppose a node fails to send a heartbeat within a specified timeout period, as shown in Figure 17.16. In that case, it is considered faulty, and appropriate actions can be taken, such as redistributing the workload or initiating a failover mechanism. Timeouts can also be used to detect and handle hanging or unresponsive components. For example, if a data loading process exceeds a predefined timeout threshold, it may indicate a fault in the data pipeline, and the system can take corrective measures.\n\n\n\n\n\n\nFigure 17.16: Heartbeat messages in distributed systems (Credit: GeeksforGeeks)\n\n\n\n\nSoftware-implemented fault tolerance (SIFT) techniques: SIFT techniques introduce redundancy and fault detection mechanisms at the software level to enhance the reliability and fault tolerance of the system (Reis et al. 2005). Example: N-version programming is a SIFT technique where multiple functionally equivalent software component versions are developed independently by different teams. This can be applied to critical components such as the model inference engine in an ML system. Multiple versions of the inference engine can be executed in parallel, and their outputs can be compared for consistency. It is considered the correct result if most versions produce the same output. If there is a discrepancy, it indicates a potential fault in one or more versions, and appropriate error-handling mechanisms can be triggered. Another example is using software-based error correction codes, such as Reed-Solomon codes (Plank 1997), to detect and correct errors in data storage or transmission, as shown in Figure 17.17. These codes add redundancy to the data, enabling detecting and correcting certain errors and enhancing the system’s fault tolerance.\n\nReis, G. A., J. Chang, N. Vachharajani, R. Rangan, and D. I. August. 2005. “SWIFT: Software Implemented Fault Tolerance.” In International Symposium on Code Generation and Optimization, 243–54. IEEE; IEEE. https://doi.org/10.1109/cgo.2005.34.\n\nPlank, James S. 1997. “A Tutorial on ReedSolomon Coding for Fault-Tolerance in RAID-Like Systems.” Software: Practice and Experience 27 (9): 995–1012.\n\n\n\n\n\n\nFigure 17.17: n-bits representation of the Reed-Solomon codes (Credit: GeeksforGeeks)\n\n\n\n\n\n\n\n\n\nExercise 17.1: Anomaly Detection\n\n\n\n\n\nIn this Colab, play the role of an AI fault detective! You’ll build an autoencoder-based anomaly detector to pinpoint errors in heart health data. Learn how to identify malfunctions in ML systems, a vital skill for creating dependable AI. We’ll use Keras Tuner to fine-tune your autoencoder for top-notch fault detection. This experience directly links to the Robust AI chapter, demonstrating the importance of fault detection in real-world applications like healthcare and autonomous systems. Get ready to strengthen the reliability of your AI creations!\n\n\n\n\n\n\n\n\n17.3.5 Summary\nTable 17.1 provides an extensive comparative analysis of transient, permanent, and intermittent faults. It outlines the primary characteristics or dimensions that distinguish these fault types. Here, we summarize the relevant dimensions we examined and explore the nuances that differentiate transient, permanent, and intermittent faults in greater detail.\n\n\n\nTable 17.1: Comparison of transient, permanent, and intermittent faults.\n\n\n\n\n\n\n\n\n\n\n\nDimension\nTransient Faults\nPermanent Faults\nIntermittent Faults\n\n\n\n\nDuration\nShort-lived, temporary\nPersistent, remains until repair or replacement\nSporadic, appears and disappears intermittently\n\n\nPersistence\nDisappears after the fault condition passes\nConsistently present until addressed\nRecurs irregularly, not always present\n\n\nCauses\nExternal factors (e.g., electromagnetic interference, cosmic rays)\nHardware defects, physical damage, wear-out\nUnstable hardware conditions, loose connections, aging components\n\n\nManifestation\nBit flips, glitches, temporary data corruption\nStuck-at faults, broken components, complete device failures\nOccasional bit flips, intermittent signal issues, sporadic malfunctions\n\n\nImpact on ML Systems\nIntroduces temporary errors or noise in computations\nCauses consistent errors or failures, affecting reliability\nLeads to sporadic and unpredictable errors, challenging to diagnose and mitigate\n\n\nDetection\nError detection codes, comparison with expected values\nBuilt-in self-tests, error detection codes, consistency checks\nMonitoring for anomalies, analyzing error patterns and correlations\n\n\nMitigation\nError correction codes, redundancy, checkpoint and restart\nHardware repair or replacement, component redundancy, failover mechanisms\nRobust design, environmental control, runtime monitoring, fault-tolerant techniques", + "text": "17.3 Hardware Faults\nHardware faults are a significant challenge in computing systems, including traditional and ML systems. These faults occur when physical components, such as processors, memory modules, storage devices, or interconnects, malfunction or behave abnormally. Hardware faults can cause incorrect computations, data corruption, system crashes, or complete system failure, compromising the integrity and trustworthiness of the computations performed by the system (Jha et al. 2019). A complete system failure refers to a situation where the entire computing system becomes unresponsive or inoperable due to a critical hardware malfunction. This type of failure is the most severe, as it renders the system unusable and may lead to data loss or corruption, requiring manual intervention to repair or replace the faulty components.\nUnderstanding the taxonomy of hardware faults is essential for anyone working with computing systems, especially in the context of ML systems. ML systems rely on complex hardware architectures and large-scale computations to train and deploy models that learn from data and make intelligent predictions or decisions. However, hardware faults can introduce errors and inconsistencies in the MLOps pipeline, affecting the trained models’ accuracy, robustness, and reliability (G. Li et al. 2017).\nKnowing the different types of hardware faults, their mechanisms, and their potential impact on system behavior is crucial for developing effective strategies to detect, mitigate, and recover them. This knowledge is necessary for designing fault-tolerant computing systems, implementing robust ML algorithms, and ensuring the overall dependability of ML-based applications.\nThe following sections will explore the three main categories of hardware faults: transient, permanent, and intermittent. We will discuss their definitions, characteristics, causes, mechanisms, and examples of how they manifest in computing systems. We will also cover detection and mitigation techniques specific to each fault type.\n\nTransient Faults: Transient faults are temporary and non-recurring. They are often caused by external factors such as cosmic rays, electromagnetic interference, or power fluctuations. A common example of a transient fault is a bit flip, where a single bit in a memory location or register changes its value unexpectedly. Transient faults can lead to incorrect computations or data corruption, but they do not cause permanent damage to the hardware.\nPermanent Faults: Permanent faults, also called hard errors, are irreversible and persist over time. They are typically caused by physical defects or wear-out of hardware components. Examples of permanent faults include stuck-at faults, where a bit or signal is permanently set to a specific value (e.g., always 0 or always 1), and device failures, such as a malfunctioning processor or a damaged memory module. Permanent faults can result in complete system failure or significant performance degradation.\nIntermittent Faults: Intermittent faults are recurring faults that appear and disappear intermittently. Unstable hardware conditions, such as loose connections, aging components, or manufacturing defects, often cause them. Intermittent faults can be challenging to diagnose and reproduce because they may occur sporadically and under specific conditions. Examples include intermittent short circuits or contact resistance issues. Intermittent faults can lead to unpredictable system behavior and intermittent errors.\n\nBy the end of this discussion, readers will have a solid understanding of fault taxonomy and its relevance to traditional computing and ML systems. This foundation will help them make informed decisions when designing, implementing, and deploying fault-tolerant solutions, improving the reliability and trustworthiness of their computing systems and ML applications.\n\n17.3.1 Transient Faults\nTransient faults in hardware can manifest in various forms, each with its own unique characteristics and causes. These faults are temporary in nature and do not result in permanent damage to the hardware components.\n\nDefinition and Characteristics\nSome of the common types of transient faults include Single Event Upsets (SEUs) caused by ionizing radiation, voltage fluctuations (Reddi and Gupta 2013) due to power supply noise or electromagnetic interference, Electromagnetic Interference (EMI) induced by external electromagnetic fields, Electrostatic Discharge (ESD) resulting from sudden static electricity flow, crosstalk caused by unintended signal coupling, ground bounce triggered by simultaneous switching of multiple outputs, timing violations due to signal timing constraint breaches, and soft errors in combinational logic affecting the output of logic circuits (Mukherjee, Emer, and Reinhardt 2005). Understanding these different types of transient faults is crucial for designing robust and resilient hardware systems that can mitigate their impact and ensure reliable operation.\n\nReddi, Vijay Janapa, and Meeta Sharma Gupta. 2013. Resilient Architecture Design for Voltage Variation. Springer International Publishing. https://doi.org/10.1007/978-3-031-01739-1.\n\nMukherjee, S. S., J. Emer, and S. K. Reinhardt. 2005. “The Soft Error Problem: An Architectural Perspective.” In 11th International Symposium on High-Performance Computer Architecture, 243–47. IEEE; IEEE. https://doi.org/10.1109/hpca.2005.37.\nAll of these transient faults are characterized by their short duration and non-permanent nature. They do not persist or leave any lasting impact on the hardware. However, they can still lead to incorrect computations, data corruption, or system misbehavior if not properly handled.\n\n\n\nCauses of Transient Faults\nTransient faults can be attributed to various external factors. One common cause is cosmic rays, high-energy particles originating from outer space. When these particles strike sensitive areas of the hardware, such as memory cells or transistors, they can induce charge disturbances that alter the stored or transmitted data. This is illustrated in Figure 17.5. Another cause of transient faults is electromagnetic interference (EMI) from nearby devices or power fluctuations. EMI can couple with the circuits and cause voltage spikes or glitches that temporarily disrupt the normal operation of the hardware.\n\n\n\n\n\n\nFigure 17.5: Mechanism of Hardware Transient Fault Occurrence (Credit: NTT)\n\n\n\n\n\nMechanisms of Transient Faults\nTransient faults can manifest through different mechanisms depending on the affected hardware component. In memory devices like DRAM or SRAM, transient faults often lead to bit flips, where a single bit changes its value from 0 to 1 or vice versa. This can corrupt the stored data or instructions. In logic circuits, transient faults can cause glitches or voltage spikes propagating through the combinational logic, resulting in incorrect outputs or control signals. Transient faults can also affect communication channels, causing bit errors or packet losses during data transmission.\n\n\nImpact on ML Systems\nA common example of a transient fault is a bit flip in the main memory. If an important data structure or critical instruction is stored in the affected memory location, it can lead to incorrect computations or program misbehavior. If a transient fault occurs in the memory storing the model weights or gradients. For instance, a bit flip in the memory storing a loop counter can cause the loop to execute indefinitely or terminate prematurely. Transient faults in control registers or flag bits can alter the flow of program execution, leading to unexpected jumps or incorrect branch decisions. In communication systems, transient faults can corrupt transmitted data packets, resulting in retransmissions or data loss.\nIn ML systems, transient faults can have significant implications during the training phase (He et al. 2023). ML training involves iterative computations and updates to model parameters based on large datasets. If a transient fault occurs in the memory storing the model weights or gradients, it can lead to incorrect updates and compromise the convergence and accuracy of the training process. Figure 17.6 Show a real-world example from Google’s production fleet where an SDC anomaly caused a significant difference in the gradient norm.\n\n\n\n\n\n\nFigure 17.6: SDC in ML training phase results in anomalies in the gradient norm. (Credit: Jeff Dean, MLSys 2024 Keynote (Google))\n\n\n\nFor example, a bit flip in the weight matrix of a neural network can cause the model to learn incorrect patterns or associations, leading to degraded performance (Wan et al. 2021). Transient faults in the data pipeline, such as corruption of training samples or labels, can also introduce noise and affect the quality of the learned model.\n\nWan, Zishen, Aqeel Anwar, Yu-Shun Hsiao, Tianyu Jia, Vijay Janapa Reddi, and Arijit Raychowdhury. 2021. “Analyzing and Improving Fault Tolerance of Learning-Based Navigation Systems.” In 2021 58th ACM/IEEE Design Automation Conference (DAC), 841–46. IEEE; IEEE. https://doi.org/10.1109/dac18074.2021.9586116.\nDuring the inference phase, transient faults can impact the reliability and trustworthiness of ML predictions. If a transient fault occurs in the memory storing the trained model parameters or in the computation of the inference results, it can lead to incorrect or inconsistent predictions. For instance, a bit flip in the activation values of a neural network can alter the final classification or regression output (Mahmoud et al. 2020).\nIn safety-critical applications, such as autonomous vehicles or medical diagnosis, transient faults during inference can have severe consequences, leading to incorrect decisions or actions (G. Li et al. 2017; Jha et al. 2019). Ensuring the resilience of ML systems against transient faults is crucial to maintaining the integrity and reliability of the predictions.\n\nLi, Guanpeng, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017. “Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–12. ACM. https://doi.org/10.1145/3126908.3126964.\n\nCourbariaux, Matthieu, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to+ 1 or-1.” arXiv Preprint arXiv:1602.02830.\n\nAygun, Sercan, Ece Olcay Gunes, and Christophe De Vleeschouwer. 2021. “Efficient and Robust Bitstream Processing in Binarised Neural Networks.” Electron. Lett. 57 (5): 219–22. https://doi.org/10.1049/ell2.12045.\nAt the other extreme, in resource-constrained environments like TinyML, Binarized Neural Networks (BNNs)(Courbariaux et al. 2016) have emerged as a promising solution. BNNs represent network weights in single-bit precision, offering computational efficiency and faster inference times. However, this binary representation renders BNNs fragile to bit-flip errors on the network weights. For instance, prior work (Aygun, Gunes, and De Vleeschouwer 2021) has shown that a two-hidden layer BNN architecture for a simple task such as MNIST classification suffers performance degradation from 98% test accuracy to 70% when random bit-flipping soft errors are inserted through model weights with a 10% probability.\nAddressing such issues requires considering flip-aware training techniques or leveraging emerging computing paradigms (e.g., stochastic computing) to enhance fault tolerance and robustness, which we will discuss in Section 17.3.4. Future research directions aim to develop hybrid architectures, novel activation functions, and loss functions tailored to bridge the accuracy gap compared to full-precision models while maintaining their computational efficiency.\n\n\n\n17.3.2 Permanent Faults\nPermanent faults are hardware defects that persist and cause irreversible damage to the affected components. These faults are characterized by their persistent nature and require repair or replacement of the faulty hardware to restore normal system functionality.\n\nDefinition and Characteristics\nPermanent faults are hardware defects that cause persistent and irreversible malfunctions in the affected components. The faulty component remains non-operational until a permanent fault is repaired or replaced. These faults are characterized by their consistent and reproducible nature, meaning that the faulty behavior is observed every time the affected component is used. Permanent faults can impact various hardware components, such as processors, memory modules, storage devices, or interconnects, leading to system crashes, data corruption, or complete system failure.\nOne notable example of a permanent fault is the Intel FDIV bug, which was discovered in 1994. The FDIV bug was a flaw in certain Intel Pentium processors’ floating-point division (FDIV) units. The bug caused incorrect results for specific division operations, leading to inaccurate calculations.\nThe FDIV bug occurred due to an error in the lookup table used by the division unit. In rare cases, the processor would fetch an incorrect value from the lookup table, resulting in a slightly less precise result than expected. For instance, Figure 17.7 shows a fraction 4195835/3145727 plotted on a Pentium processor with the FDIV permanent fault. The triangular regions are where erroneous calculations occurred. Ideally, all correct values would round to 1.3338, but the erroneous results show 1.3337, indicating a mistake in the 5th digit.\nAlthough the error was small, it could compound over many division operations, leading to significant inaccuracies in mathematical calculations. The impact of the FDIV bug was significant, especially for applications that relied heavily on precise floating-point division, such as scientific simulations, financial calculations, and computer-aided design. The bug led to incorrect results, which could have severe consequences in fields like finance or engineering.\n\n\n\n\n\n\nFigure 17.7: Intel Pentium processor with the FDIV permanent fault. The triangular regions are where erroneous calculations occurred. (Credit: Byte Magazine)\n\n\n\nThe Intel FDIV bug is a cautionary tale for the potential impact of permanent faults on ML systems. In the context of ML, permanent faults in hardware components can lead to incorrect computations, affecting the accuracy and reliability of the models. For example, if an ML system relies on a processor with a faulty floating-point unit, similar to the Intel FDIV bug, it could introduce errors in the calculations performed during training or inference.\nThese errors can propagate through the model, leading to inaccurate predictions or skewed learning. In applications where ML is used for critical tasks, such as autonomous driving, medical diagnosis, or financial forecasting, the consequences of incorrect computations due to permanent faults can be severe.\nIt is crucial for ML practitioners to be aware of the potential impact of permanent faults and to incorporate fault-tolerant techniques, such as hardware redundancy, error detection and correction mechanisms, and robust algorithm design, to mitigate the risks associated with these faults. Additionally, thorough testing and validation of ML hardware components can help identify and address permanent faults before they impact the system’s performance and reliability.\n\n\nCauses of Permanent Faults\nPermanent faults can arise from several causes, including manufacturing defects and wear-out mechanisms. Manufacturing defects are inherent flaws introduced during the fabrication process of hardware components. These defects include improper etching, incorrect doping, or contamination, leading to non-functional or partially functional components.\nOn the other hand, wear-out mechanisms occur over time as the hardware components are subjected to prolonged use and stress. Factors such as electromigration, oxide breakdown, or thermal stress can cause gradual degradation of the components, eventually leading to permanent failures.\n\n\nMechanisms of Permanent Faults\nPermanent faults can manifest through various mechanisms, depending on the nature and location of the fault. Stuck-at faults (Seong et al. 2010) are common permanent faults where a signal or memory cell remains fixed at a particular value (either 0 or 1) regardless of the inputs, as illustrated in Figure 17.8.\n\nSeong, Nak Hee, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers, and Hsien-Hsin S. Lee. 2010. “SAFER: Stuck-at-fault Error Recovery for Memories.” In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 115–24. IEEE; IEEE. https://doi.org/10.1109/micro.2010.46.\n\n\n\n\n\n\nFigure 17.8: Stuck-at Fault Model in Digital Circuits (Credit: Accendo Reliability)\n\n\n\nStuck-at faults can occur in logic gates, memory cells, or interconnects, causing incorrect computations or data corruption. Another mechanism is device failures, where a component, such as a transistor or a memory cell, completely ceases to function. This can be due to manufacturing defects or severe wear-out. Bridging faults occur when two or more signal lines are unintentionally connected, causing short circuits or incorrect logic behavior.\nIn addition to stuck-at faults, there are several other types of permanent faults that can affect digital circuits that can impact an ML system. Delay faults can cause the propagation delay of a signal to exceed the specified limit, leading to timing violations. Interconnect faults, such as open faults (broken wires), resistive faults (increased resistance), or capacitive faults (increased capacitance), can cause signal integrity issues or timing violations. Memory cells can also suffer from various faults, including transition faults (inability to change state), coupling faults (interference between adjacent cells), and neighborhood pattern sensitive faults (faults that depend on the values of neighboring cells). Other permanent faults can occur in the power supply network or the clock distribution network, affecting the functionality and timing of the circuit.\n\n\nImpact on ML Systems\nPermanent faults can severely affect the behavior and reliability of computing systems. For example, a stuck-at-fault in a processor’s arithmetic logic unit (ALU) can cause incorrect computations, leading to erroneous results or system crashes. A permanent fault in a memory module, such as a stuck-at fault in a specific memory cell, can corrupt the stored data, causing data loss or program misbehavior. In storage devices, permanent faults like bad sectors or device failures can result in data inaccessibility or complete loss of stored information. Permanent interconnect faults can disrupt communication channels, causing data corruption or system hangs.\nPermanent faults can significantly affect ML systems during the training and inference phases. During training, permanent faults in processing units or memory can lead to incorrect computations, resulting in corrupted or suboptimal models (He et al. 2023). Furthermore, faults in storage devices can corrupt the training data or the stored model parameters, leading to data loss or model inconsistencies (He et al. 2023).\n\nZhang, Jeff Jun, Tianyu Gu, Kanad Basu, and Siddharth Garg. 2018. “Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator.” In 2018 IEEE 36th VLSI Test Symposium (VTS), 1–6. IEEE; IEEE. https://doi.org/10.1109/vts.2018.8368656.\nDuring inference, permanent faults can impact the reliability and correctness of ML predictions. Faults in the processing units can produce incorrect results or cause system failures, while faults in memory storing the model parameters can lead to corrupted or outdated models being used for inference (J. J. Zhang et al. 2018).\nTo mitigate the impact of permanent faults in ML systems, fault-tolerant techniques must be employed at both the hardware and software levels. Hardware redundancy, such as duplicating critical components or using error-correcting codes (Kim, Sullivan, and Erez 2015), can help detect and recover from permanent faults. Software techniques, such as checkpoint and restart mechanisms (Egwutuoha et al. 2013), can enable the system to recover from permanent faults by returning to a previously saved state. Regular monitoring, testing, and maintenance of ML systems can help identify and replace faulty components before they cause significant disruptions.\n\nKim, Jungrae, Michael Sullivan, and Mattan Erez. 2015. “Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory.” In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 101–12. IEEE; IEEE. https://doi.org/10.1109/hpca.2015.7056025.\n\nEgwutuoha, Ifeanyi P., David Levy, Bran Selic, and Shiping Chen. 2013. “A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems.” The Journal of Supercomputing 65 (3): 1302–26. https://doi.org/10.1007/s11227-013-0884-0.\nDesigning ML systems with fault tolerance in mind is crucial to ensure their reliability and robustness in the presence of permanent faults. This may involve incorporating redundancy, error detection and correction mechanisms, and fail-safe strategies into the system architecture. By proactively addressing the challenges posed by permanent faults, ML systems can maintain their integrity, accuracy, and trustworthiness, even in the face of hardware failures.\n\n\n\n17.3.3 Intermittent Faults\nIntermittent faults are hardware faults that occur sporadically and unpredictably in a system. An example is illustrated in Figure 17.9, where cracks in the material can introduce increased resistance in circuitry. These faults are particularly challenging to detect and diagnose because they appear and disappear intermittently, making it difficult to reproduce and isolate the root cause. Intermittent faults can lead to system instability, data corruption, and performance degradation.\n\n\n\n\n\n\nFigure 17.9: Increased resistance due to an intermittent fault – crack between copper bump and package solder (Credit: Constantinescu)\n\n\n\n\nDefinition and Characteristics\nIntermittent faults are characterized by their sporadic and non-deterministic nature. They occur irregularly and may appear and disappear spontaneously, with varying durations and frequencies. These faults do not consistently manifest every time the affected component is used, making them harder to detect than permanent faults. Intermittent faults can affect various hardware components, including processors, memory modules, storage devices, or interconnects. They can cause transient errors, data corruption, or unexpected system behavior.\nIntermittent faults can significantly impact the behavior and reliability of computing systems (Rashid, Pattabiraman, and Gopalakrishnan 2015). For example, an intermittent fault in a processor’s control logic can cause irregular program flow, leading to incorrect computations or system hangs. Intermittent faults in memory modules can corrupt data values, resulting in erroneous program execution or data inconsistencies. In storage devices, intermittent faults can cause read/write errors or data loss. Intermittent faults in communication channels can lead to data corruption, packet loss, or intermittent connectivity issues. These faults can cause system crashes, data integrity problems, or performance degradation, depending on the severity and frequency of the intermittent failures.\n\n———. 2015. “Characterizing the Impact of Intermittent Hardware Faults on Programs.” IEEE Trans. Reliab. 64 (1): 297–310. https://doi.org/10.1109/tr.2014.2363152.\n\n\nCauses of Intermittent Faults\nIntermittent faults can arise from several causes, both internal and external, to the hardware components (Constantinescu 2008). One common cause is aging and wear-out of the components. As electronic devices age, they become more susceptible to intermittent failures due to degradation mechanisms such as electromigration, oxide breakdown, or solder joint fatigue.\n\nConstantinescu, Cristian. 2008. “Intermittent Faults and Effects on Reliability of Integrated Circuits.” In 2008 Annual Reliability and Maintainability Symposium, 370–74. IEEE; IEEE. https://doi.org/10.1109/rams.2008.4925824.\nManufacturing defects or process variations can also introduce intermittent faults, where marginal or borderline components may exhibit sporadic failures under specific conditions, as shown in Figure 17.10.\nEnvironmental factors, such as temperature fluctuations, humidity, or vibrations, can trigger intermittent faults by altering the electrical characteristics of the components. Loose or degraded connections, such as those in connectors or printed circuit boards, can cause intermittent faults.\n\n\n\n\n\n\nFigure 17.10: Residue induced intermittent fault in a DRAM chip (Credit: Hynix Semiconductor)\n\n\n\n\n\nMechanisms of Intermittent Faults\nIntermittent faults can manifest through various mechanisms, depending on the underlying cause and the affected component. One mechanism is the intermittent open or short circuit, where a signal path or connection becomes temporarily disrupted or shorted, causing erratic behavior. Another mechanism is the intermittent delay fault (J. Zhang et al. 2018), where the timing of signals or propagation delays becomes inconsistent, leading to synchronization issues or incorrect computations. Intermittent faults can manifest as transient bit flips or soft errors in memory cells or registers, causing data corruption or incorrect program execution.\n\nZhang, Jeff, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018. “ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators.” In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465918.\n\n\nImpact on ML Systems\nIn the context of ML systems, intermittent faults can introduce significant challenges and impact the system’s reliability and performance. During the training phase, intermittent faults in processing units or memory can lead to inconsistencies in computations, resulting in incorrect or noisy gradients and weight updates. This can affect the convergence and accuracy of the training process, leading to suboptimal or unstable models. Intermittent data storage or retrieval faults can corrupt the training data, introducing noise or errors that degrade the quality of the learned models (He et al. 2023).\n\nHe, Yi, Mike Hutton, Steven Chan, Robert De Gruijl, Rama Govindaraju, Nishant Patil, and Yanjing Li. 2023. “Understanding and Mitigating Hardware Failures in Deep Learning Training Systems.” In Proceedings of the 50th Annual International Symposium on Computer Architecture, 1–16. IEEE; ACM. https://doi.org/10.1145/3579371.3589105.\nDuring the inference phase, intermittent faults can impact the reliability and consistency of ML predictions. Faults in the processing units or memory can cause incorrect computations or data corruption, leading to erroneous or inconsistent predictions. Intermittent faults in the data pipeline can introduce noise or errors in the input data, affecting the accuracy and robustness of the predictions. In safety-critical applications, such as autonomous vehicles or medical diagnosis systems, intermittent faults can have severe consequences, leading to incorrect decisions or actions that compromise safety and reliability.\nMitigating the impact of intermittent faults in ML systems requires a multifaceted approach (Rashid, Pattabiraman, and Gopalakrishnan 2012). At the hardware level, techniques such as robust design practices, component selection, and environmental control can help reduce the occurrence of intermittent faults. Redundancy and error correction mechanisms can be employed to detect and recover from intermittent failures. At the software level, runtime monitoring, anomaly detection, and fault-tolerant techniques can be incorporated into the ML pipeline. This may include techniques such as data validation, outlier detection, model ensembling, or runtime model adaptation to handle intermittent faults gracefully.\n\nRashid, Layali, Karthik Pattabiraman, and Sathish Gopalakrishnan. 2012. “Intermittent Hardware Errors Recovery: Modeling and Evaluation.” In 2012 Ninth International Conference on Quantitative Evaluation of Systems, 220–29. IEEE; IEEE. https://doi.org/10.1109/qest.2012.37.\nDesigning ML systems resilient to intermittent faults is crucial to ensuring their reliability and robustness. This involves incorporating fault-tolerant techniques, runtime monitoring, and adaptive mechanisms into the system architecture. By proactively addressing the challenges of intermittent faults, ML systems can maintain their accuracy, consistency, and trustworthiness, even in sporadic hardware failures. Regular testing, monitoring, and maintenance of ML systems can help identify and mitigate intermittent faults before they cause significant disruptions or performance degradation.\n\n\n\n17.3.4 Detection and Mitigation\nThis section explores various fault detection techniques, including hardware-level and software-level approaches, and discusses effective mitigation strategies to enhance the resilience of ML systems. Additionally, we will look into resilient ML system design considerations, present case studies and examples, and highlight future research directions in fault-tolerant ML systems.\n\nFault Detection Techniques\nFault detection techniques are important for identifying and localizing hardware faults in ML systems. These techniques can be broadly categorized into hardware-level and software-level approaches, each offering unique capabilities and advantages.\n\nHardware-level fault detection\nHardware-level fault detection techniques are implemented at the physical level of the system and aim to identify faults in the underlying hardware components. There are several hardware techniques, but broadly, we can bucket these different mechanisms into the following categories.\nBuilt-in self-test (BIST) mechanisms: BIST is a powerful technique for detecting faults in hardware components (Bushnell and Agrawal 2002). It involves incorporating additional hardware circuitry into the system for self-testing and fault detection. BIST can be applied to various components, such as processors, memory modules, or application-specific integrated circuits (ASICs). For example, BIST can be implemented in a processor using scan chains, which are dedicated paths that allow access to internal registers and logic for testing purposes.\n\nBushnell, Michael L, and Vishwani D Agrawal. 2002. “Built-in Self-Test.” Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, 489–548.\nDuring the BIST process, predefined test patterns are applied to the processor’s internal circuitry, and the responses are compared against expected values. Any discrepancies indicate the presence of faults. Intel’s Xeon processors, for instance, include BIST mechanisms to test the CPU cores, cache memory, and other critical components during system startup.\nError detection codes: Error detection codes are widely used to detect data storage and transmission errors (Hamming 1950). These codes add redundant bits to the original data, allowing the detection of bit errors. Example: Parity checks are a simple form of error detection code shown in Figure 17.11. In a single-bit parity scheme, an extra bit is appended to each data word, making the number of 1s in the word even (even parity) or odd (odd parity).\n\nHamming, R. W. 1950. “Error Detecting and Error Correcting Codes.” Bell Syst. Tech. J. 29 (2): 147–60. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.\n\n\n\n\n\n\nFigure 17.11: Parity bit example (Credit: Computer Hope)\n\n\n\nWhen reading the data, the parity is checked, and if it doesn’t match the expected value, an error is detected. More advanced error detection codes, such as cyclic redundancy checks (CRC), calculate a checksum based on the data and append it to the message. The checksum is recalculated at the receiving end and compared with the transmitted checksum to detect errors. Error-correcting code (ECC) memory modules, commonly used in servers and critical systems, employ advanced error detection and correction codes to detect and correct single-bit or multi-bit errors in memory.\nHardware redundancy and voting mechanisms: Hardware redundancy involves duplicating critical components and comparing their outputs to detect and mask faults (Sheaffer, Luebke, and Skadron 2007). Voting mechanisms, such as triple modular redundancy (TMR), employ multiple instances of a component and compare their outputs to identify and mask faulty behavior (Arifeen, Hassan, and Lee 2020).\n\nSheaffer, Jeremy W, David P Luebke, and Kevin Skadron. 2007. “A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors.” In Graphics Hardware, 2007:55–64. Citeseer.\n\nArifeen, Tooba, Abdus Sami Hassan, and Jeong-A Lee. 2020. “Approximate Triple Modular Redundancy: A Survey.” #IEEE_O_ACC# 8: 139851–67. https://doi.org/10.1109/access.2020.3012673.\n\nYeh, Y. C. 1996. “Triple-Triple Redundant 777 Primary Flight Computer.” In 1996 IEEE Aerospace Applications Conference. Proceedings, 1:293–307. IEEE; IEEE. https://doi.org/10.1109/aero.1996.495891.\nIn a TMR system, three identical instances of a hardware component, such as a processor or a sensor, perform the same computation in parallel. The outputs of these instances are fed into a voting circuit, which compares the results and selects the majority value as the final output. If one of the instances produces an incorrect result due to a fault, the voting mechanism masks the error and maintains the correct output. TMR is commonly used in aerospace and aviation systems, where high reliability is critical. For instance, the Boeing 777 aircraft employs TMR in its primary flight computer system to ensure the availability and correctness of flight control functions (Yeh 1996).\nTesla’s self-driving computers employ a redundant hardware architecture to ensure the safety and reliability of critical functions, such as perception, decision-making, and vehicle control, as shown in Figure 17.12. One key component of this architecture is using dual modular redundancy (DMR) in the car’s onboard computer systems.\n\n\n\n\n\n\nFigure 17.12: Tesla full self-driving computer with dual redundant SoCs (Credit: Tesla)\n\n\n\nIn Tesla’s DMR implementation, two identical hardware units, often called “redundant computers” or “redundant control units,” perform the same computations in parallel (Bannon et al. 2019). Each unit independently processes sensor data, executes perception and decision-making algorithms, and generates control commands for the vehicle’s actuators (e.g., steering, acceleration, and braking).\n\nBannon, Pete, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes. 2019. “Computer and Redundancy Solution for the Full Self-Driving Computer.” In 2019 IEEE Hot Chips 31 Symposium (HCS), 1–22. IEEE Computer Society; IEEE. https://doi.org/10.1109/hotchips.2019.8875645.\nThe outputs of these two redundant units are continuously compared to detect any discrepancies or faults. If the outputs match, the system assumes that both units function correctly, and the control commands are sent to the vehicle’s actuators. However, if there is a mismatch between the outputs, the system identifies a potential fault in one of the units and takes appropriate action to ensure safe operation.\nThe system may employ additional mechanisms to determine which unit is faulty in a mismatch. This can involve using diagnostic algorithms, comparing the outputs with data from other sensors or subsystems, or analyzing the consistency of the outputs over time. Once the faulty unit is identified, the system can isolate it and continue operating using the output from the non-faulty unit.\nDMR in Tesla’s self-driving computer provides an extra safety and fault tolerance layer. By having two independent units performing the same computations, the system can detect and mitigate faults that may occur in one of the units. This redundancy helps prevent single points of failure and ensures that critical functions remain operational despite hardware faults.\nFurthermore, Tesla also incorporates additional redundancy mechanisms beyond DMR. For example, they utilize redundant power supplies, steering and braking systems, and diverse sensor suites (e.g., cameras, radar, and ultrasonic sensors) to provide multiple layers of fault tolerance. These redundancies collectively contribute to the overall safety and reliability of the self-driving system.\nIt’s important to note that while DMR provides fault detection and some level of fault tolerance, TMR may provide a different level of fault masking. In DMR, if both units experience simultaneous faults or the fault affects the comparison mechanism, the system may be unable to identify the fault. Therefore, Tesla’s SDCs rely on a combination of DMR and other redundancy mechanisms to achieve a high level of fault tolerance.\nThe use of DMR in Tesla’s self-driving computer highlights the importance of hardware redundancy in safety-critical applications. By employing redundant computing units and comparing their outputs, the system can detect and mitigate faults, enhancing the overall safety and reliability of the self-driving functionality.\nGoogle employs redundant hot spares to deal with SDC issues within its data centers, thereby enhancing the reliability of critical functions. As illustrated in Figure 17.13, during the normal training phase, multiple synchronous training workers function flawlessly. However, if a worker becomes defective and causes SDC, an SDC checker automatically identifies the issues. Upon detecting the SDC, the SDC checker moves the training to a hot spare and sends the defective machine for repair. This redundancy safeguards the continuity and reliability of ML training, effectively minimizing downtime and preserving data integrity.\n\n\n\n\n\n\nFigure 17.13: Google employs hot spare cores to transparently handle SDCs in the data center. (Credit: Jeff Dean, MLSys 2024 Keynote (Google))\n\n\n\nWatchdog timers: Watchdog timers are hardware components that monitor the execution of critical tasks or processes (Pont and Ong 2002). They are commonly used to detect and recover from software or hardware faults that cause a system to become unresponsive or stuck in an infinite loop. In an embedded system, a watchdog timer can be configured to monitor the execution of the main control loop, as illustrated in Figure 17.14. The software periodically resets the watchdog timer to indicate that it functions correctly. Suppose the software fails to reset the timer within a specified time limit (timeout period). In that case, the watchdog timer assumes that the system has encountered a fault and triggers a predefined recovery action, such as resetting the system or switching to a backup component. Watchdog timers are widely used in automotive electronics, industrial control systems, and other safety-critical applications to ensure the timely detection and recovery from faults.\n\nPont, Michael J, and Royan HL Ong. 2002. “Using Watchdog Timers to Improve the Reliability of Single-Processor Embedded Systems: Seven New Patterns and a Case Study.” In Proceedings of the First Nordic Conference on Pattern Languages of Programs, 159–200. Citeseer.\n\n\n\n\n\n\nFigure 17.14: Watchdog timer example in detecting MCU faults (Credit: Ablic)\n\n\n\n\n\nSoftware-level fault detection\nSoftware-level fault detection techniques rely on software algorithms and monitoring mechanisms to identify system faults. These techniques can be implemented at various levels of the software stack, including the operating system, middleware, or application level.\nRuntime monitoring and anomaly detection: Runtime monitoring involves continuously observing the behavior of the system and its components during execution (Francalanza et al. 2017). It helps detect anomalies, errors, or unexpected behavior that may indicate the presence of faults. For example, consider an ML-based image classification system deployed in a self-driving car. Runtime monitoring can be implemented to track the classification model’s performance and behavior (Mahmoud et al. 2021).\n\nFrancalanza, Adrian, Luca Aceto, Antonis Achilleos, Duncan Paul Attard, Ian Cassar, Dario Della Monica, and Anna Ingólfsdóttir. 2017. “A Foundation for Runtime Monitoring.” In International Conference on Runtime Verification, 8–29. Springer.\n\nMahmoud, Abdulrahman, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, and Stephen W. Keckler. 2021. “Optimizing Selective Protection for CNN Resilience.” In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 127–38. IEEE. https://doi.org/10.1109/issre52982.2021.00025.\n\nChandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly Detection: A Survey.” ACM Comput. Surv. 41 (3): 1–58. https://doi.org/10.1145/1541880.1541882.\nAnomaly detection algorithms can be applied to the model’s predictions or intermediate layer activations, such as statistical outlier detection or machine learning-based approaches (e.g., One-Class SVM or Autoencoders) (Chandola, Banerjee, and Kumar 2009). Figure 17.15 shows example of anomaly detection. Suppose the monitoring system detects a significant deviation from the expected patterns, such as a sudden drop in classification accuracy or out-of-distribution samples. In that case, it can raise an alert indicating a potential fault in the model or the input data pipeline. This early detection allows for timely intervention and fault mitigation strategies to be applied.\n\n\n\n\n\n\nFigure 17.15: Examples of anomaly detection. (a) Fully supervised anomaly detection, (b) normal-only anomaly detection, (c, d, e) semi-supervised anomaly detection, (f) unsupervised anomaly detection (Credit: Google)\n\n\n\nConsistency checks and data validation: Consistency checks and data validation techniques ensure data integrity and correctness at different processing stages in an ML system (Lindholm et al. 2019). These checks help detect data corruption, inconsistencies, or errors that may propagate and affect the system’s behavior. Example: In a distributed ML system where multiple nodes collaborate to train a model, consistency checks can be implemented to validate the integrity of the shared model parameters. Each node can compute a checksum or hash of the model parameters before and after the training iteration, as shown in Figure 17.15. Any inconsistencies or data corruption can be detected by comparing the checksums across nodes. Additionally, range checks can be applied to the input data and model outputs to ensure they fall within expected bounds. For instance, if an autonomous vehicle’s perception system detects an object with unrealistic dimensions or velocities, it can indicate a fault in the sensor data or the perception algorithms (Wan et al. 2023).\n\nLindholm, Andreas, Dave Zachariah, Petre Stoica, and Thomas B. Schon. 2019. “Data Consistency Approach to Model Validation.” #IEEE_O_ACC# 7: 59788–96. https://doi.org/10.1109/access.2019.2915109.\n\nWan, Zishen, Yiming Gan, Bo Yu, S Liu, A Raychowdhury, and Y Zhu. 2023. “Vpp: The Vulnerability-Proportional Protection Paradigm Towards Reliable Autonomous Machines.” In Proceedings of the 5th International Workshop on Domain Specific System Architecture (DOSSA), 1–6.\n\nKawazoe Aguilera, Marcos, Wei Chen, and Sam Toueg. 1997. “Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication.” In Distributed Algorithms: 11th International Workshop, WDAG’97 Saarbrücken, Germany, September 2426, 1997 Proceedings 11, 126–40. Springer.\nHeartbeat and timeout mechanisms: Heartbeat mechanisms and timeouts are commonly used to detect faults in distributed systems and ensure the liveness and responsiveness of components (Kawazoe Aguilera, Chen, and Toueg 1997). These are quite similar to the watchdog timers found in hardware. For example, in a distributed ML system, where multiple nodes collaborate to perform tasks such as data preprocessing, model training, or inference, heartbeat mechanisms can be implemented to monitor the health and availability of each node. Each node periodically sends a heartbeat message to a central coordinator or its peer nodes, indicating its status and availability. Suppose a node fails to send a heartbeat within a specified timeout period, as shown in Figure 17.16. In that case, it is considered faulty, and appropriate actions can be taken, such as redistributing the workload or initiating a failover mechanism. Timeouts can also be used to detect and handle hanging or unresponsive components. For example, if a data loading process exceeds a predefined timeout threshold, it may indicate a fault in the data pipeline, and the system can take corrective measures.\n\n\n\n\n\n\nFigure 17.16: Heartbeat messages in distributed systems (Credit: GeeksforGeeks)\n\n\n\n\nSoftware-implemented fault tolerance (SIFT) techniques: SIFT techniques introduce redundancy and fault detection mechanisms at the software level to enhance the reliability and fault tolerance of the system (Reis et al. 2005). Example: N-version programming is a SIFT technique where multiple functionally equivalent software component versions are developed independently by different teams. This can be applied to critical components such as the model inference engine in an ML system. Multiple versions of the inference engine can be executed in parallel, and their outputs can be compared for consistency. It is considered the correct result if most versions produce the same output. If there is a discrepancy, it indicates a potential fault in one or more versions, and appropriate error-handling mechanisms can be triggered. Another example is using software-based error correction codes, such as Reed-Solomon codes (Plank 1997), to detect and correct errors in data storage or transmission, as shown in Figure 17.17. These codes add redundancy to the data, enabling detecting and correcting certain errors and enhancing the system’s fault tolerance.\n\nReis, G. A., J. Chang, N. Vachharajani, R. Rangan, and D. I. August. 2005. “SWIFT: Software Implemented Fault Tolerance.” In International Symposium on Code Generation and Optimization, 243–54. IEEE; IEEE. https://doi.org/10.1109/cgo.2005.34.\n\nPlank, James S. 1997. “A Tutorial on ReedSolomon Coding for Fault-Tolerance in RAID-Like Systems.” Software: Practice and Experience 27 (9): 995–1012.\n\n\n\n\n\n\nFigure 17.17: n-bits representation of the Reed-Solomon codes (Credit: GeeksforGeeks)\n\n\n\n\n\n\n\n\n\nExercise 17.1: Anomaly Detection\n\n\n\n\n\nIn this Colab, play the role of an AI fault detective! You’ll build an autoencoder-based anomaly detector to pinpoint errors in heart health data. Learn how to identify malfunctions in ML systems, a vital skill for creating dependable AI. We’ll use Keras Tuner to fine-tune your autoencoder for top-notch fault detection. This experience directly links to the Robust AI chapter, demonstrating the importance of fault detection in real-world applications like healthcare and autonomous systems. Get ready to strengthen the reliability of your AI creations!\n\n\n\n\n\n\n\n\n17.3.5 Summary\nTable 17.1 provides an extensive comparative analysis of transient, permanent, and intermittent faults. It outlines the primary characteristics or dimensions that distinguish these fault types. Here, we summarize the relevant dimensions we examined and explore the nuances that differentiate transient, permanent, and intermittent faults in greater detail.\n\n\n\nTable 17.1: Comparison of transient, permanent, and intermittent faults.\n\n\n\n\n\n\n\n\n\n\n\nDimension\nTransient Faults\nPermanent Faults\nIntermittent Faults\n\n\n\n\nDuration\nShort-lived, temporary\nPersistent, remains until repair or replacement\nSporadic, appears and disappears intermittently\n\n\nPersistence\nDisappears after the fault condition passes\nConsistently present until addressed\nRecurs irregularly, not always present\n\n\nCauses\nExternal factors (e.g., electromagnetic interference, cosmic rays)\nHardware defects, physical damage, wear-out\nUnstable hardware conditions, loose connections, aging components\n\n\nManifestation\nBit flips, glitches, temporary data corruption\nStuck-at faults, broken components, complete device failures\nOccasional bit flips, intermittent signal issues, sporadic malfunctions\n\n\nImpact on ML Systems\nIntroduces temporary errors or noise in computations\nCauses consistent errors or failures, affecting reliability\nLeads to sporadic and unpredictable errors, challenging to diagnose and mitigate\n\n\nDetection\nError detection codes, comparison with expected values\nBuilt-in self-tests, error detection codes, consistency checks\nMonitoring for anomalies, analyzing error patterns and correlations\n\n\nMitigation\nError correction codes, redundancy, checkpoint and restart\nHardware repair or replacement, component redundancy, failover mechanisms\nRobust design, environmental control, runtime monitoring, fault-tolerant techniques", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1936,7 +1936,7 @@ "section": "17.4 ML Model Robustness", "text": "17.4 ML Model Robustness\n\n17.4.1 Adversarial Attacks\n\nDefinition and Characteristics\nAdversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) (Parrish et al. 2023). By adding slight perturbations to input data, adversaries can \"hack\" a model’s pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction, as shown in Figure 17.18.\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\n\n\n\n\n\nFigure 17.18: A small adversarial noise added to the original image can make the neural network classify the image as a Guacamole instead of an Egyptian cat (Credit: Sutanto)\n\n\n\nOne can generate prompts that lead to unsafe images in text-to-image models like DALLE (Ramesh et al. 2021) or Stable Diffusion (Rombach et al. 2022). For example, by altering the pixel values of an image, attackers can deceive a facial recognition system into identifying a face as a different person.\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot Text-to-Image Generation.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent Diffusion Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\nAdversarial attacks exploit the way ML models learn and make decisions during inference. These models work on the principle of recognizing patterns in data. An adversary crafts special inputs with perturbations to mislead the model’s pattern recognition---essentially ‘hacking’ the model’s perceptions.\nAdversarial attacks fall under different scenarios:\n\nWhitebox Attacks: The attacker fully knows the target model’s internal workings, including the training data, parameters, and architecture (Ye and Hamidi 2021). This comprehensive access creates favorable conditions for attackers to exploit the model’s vulnerabilities. The attacker can use specific and subtle weaknesses to craft effective adversarial examples.\nBlackbox Attacks: In contrast to white-box attacks, black-box attacks involve the attacker having little to no knowledge of the target model (Guo et al. 2019). To carry out the attack, the adversarial actor must carefully observe the model’s output behavior.\nGreybox Attacks: These fall between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model’s internal design (Xu et al. 2021). For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real world, practical attacks fall under black black-box box grey-boxes.\n\n\nYe, Linfeng, and Shayan Mohajer Hamidi. 2021. “Thundernna: A White Box Adversarial Attack.” arXiv Preprint arXiv:2111.12305.\n\nGuo, Chuan, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. 2019. “Simple Black-Box Adversarial Attacks.” In International Conference on Machine Learning, 2484–93. PMLR.\n\nXu, Ying, Xu Zhong, Antonio Jimeno Yepes, and Jey Han Lau. 2021. “Grey-Box Adversarial Attack and Defence for Sentiment Classification.” arXiv Preprint arXiv:2103.11576.\nThe landscape of machine learning models is complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, various adversarial attack methods have emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models:\n\nGenerative Adversarial Networks (GANs) are deep learning models that consist of two networks competing against each other: a generator and a discriminator (Goodfellow et al. 2020). The generator tries to synthesize realistic data while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that the target model misclassifies. These GAN-generated images can then attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.\nTransfer Learning Adversarial Attacks exploit the knowledge transferred from a pre-trained model to a target model, creating adversarial examples that can deceive both models. These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer responsible for making the final classifications). Referred to as \"headless attacks,\" these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used (Abdelkader et al. 2020).\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020. “Headless Horseman: Adversarial Attacks on Transfer Learning Models.” In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\n\nMechanisms of Adversarial Attacks\n\n\n\n\n\n\nFigure 17.19: Gradient-Based Attacks (Credit: Ivezic)\n\n\n\nGradient-based Attacks\nOne prominent category of adversarial attacks is gradient-based attacks. These attacks leverage the gradients of the ML model’s loss function to craft adversarial examples. The Fast Gradient Sign Method (FGSM) is a well-known technique in this category. FGSM perturbs the input data by adding small noise in the gradient direction, aiming to maximize the model’s prediction error. FGSM can quickly generate adversarial examples, as shown in Figure 17.19, by taking a single step in the gradient direction.\nAnother variant, the Projected Gradient Descent (PGD) attack, extends FGSM by iteratively applying the gradient update step, allowing for more refined and powerful adversarial examples. The Jacobian-based Saliency Map Attack (JSMA) is another gradient-based approach that identifies the most influential input features and perturbs them to create adversarial examples.\nOptimization-based Attacks\nThese attacks formulate the generation of adversarial examples as an optimization problem. The Carlini and Wagner (C&W) attack is a prominent example in this category. It aims to find the smallest perturbation that can cause misclassification while maintaining the perceptual similarity to the original input. The C&W attack employs an iterative optimization process to minimize the perturbation while maximizing the model’s prediction error.\nAnother optimization-based approach is the Elastic Net Attack to DNNs (EAD), which incorporates elastic net regularization to generate adversarial examples with sparse perturbations.\nTransfer-based Attacks\nTransfer-based attacks exploit the transferability property of adversarial examples. Transferability refers to the phenomenon where adversarial examples crafted for one ML model can often fool other models, even if they have different architectures or were trained on different datasets. This enables attackers to generate adversarial examples using a surrogate model and then transfer them to the target model without requiring direct access to its parameters or gradients. Transfer-based attacks highlight the generalization of adversarial vulnerabilities across different models and the potential for black-box attacks.\nPhysical-world Attacks\nPhysical-world attacks bring adversarial examples into the realm of real-world scenarios. These attacks involve creating physical objects or manipulations that can deceive ML models when captured by sensors or cameras. Adversarial patches, for example, are small, carefully designed patches that can be placed on objects to fool object detection or classification models. When attached to real-world objects, these patches can cause models to misclassify or fail to detect the objects accurately. Adversarial objects, such as 3D-printed sculptures or modified road signs, can also be crafted to deceive ML systems in physical environments.\nSummary\nTable 17.2 a concise overview of the different categories of adversarial attacks, including gradient-based attacks (FGSM, PGD, JSMA), optimization-based attacks (C&W, EAD), transfer-based attacks, and physical-world attacks (adversarial patches and objects). Each attack is briefly described, highlighting its key characteristics and mechanisms.\n\n\n\nTable 17.2: Different attack types on ML models.\n\n\n\n\n\n\n\n\n\n\nAttack Category\nAttack Name\nDescription\n\n\n\n\nGradient-based\nFast Gradient Sign Method (FGSM)\nPerturbs input data by adding small noise in the gradient direction to maximize prediction error.\n\n\n\nProjected Gradient Descent (PGD)\nExtends FGSM by iteratively applying the gradient update step for more refined adversarial examples.\n\n\n\nJacobian-based Saliency Map Attack (JSMA)\nIdentifies influential input features and perturbs them to create adversarial examples.\n\n\nOptimization-based\nCarlini and Wagner (C&W) Attack\nFinds the smallest perturbation that causes misclassification while maintaining perceptual similarity.\n\n\n\nElastic Net Attack to DNNs (EAD)\nIncorporates elastic net regularization to generate adversarial examples with sparse perturbations.\n\n\nTransfer-based\nTransferability-based Attacks\nExploits the transferability of adversarial examples across different models, enabling black-box attacks.\n\n\nPhysical-world\nAdversarial Patches\nSmall, carefully designed patches placed on objects to fool object detection or classification models.\n\n\n\nAdversarial Objects\nPhysical objects (e.g., 3D-printed sculptures, modified road signs) crafted to deceive ML systems in real-world scenarios.\n\n\n\n\n\n\nThe mechanisms of adversarial attacks reveal the intricate interplay between the ML model’s decision boundaries, the input data, and the attacker’s objectives. By carefully manipulating the input data, attackers can exploit the model’s sensitivities and blind spots, leading to incorrect predictions. The success of adversarial attacks highlights the need for a deeper understanding of ML models’ robustness and generalization properties.\nDefending against adversarial attacks requires a multifaceted approach. Adversarial training is one common defense strategy in which models are trained on adversarial examples to improve robustness. Exposing the model to adversarial examples during training teaches it to classify them correctly and become more resilient to attacks. Defensive distillation, input preprocessing, and ensemble methods are other techniques that can help mitigate the impact of adversarial attacks.\nAs adversarial machine learning evolves, researchers explore new attack mechanisms and develop more sophisticated defenses. The arms race between attackers and defenders drives the need for constant innovation and vigilance in securing ML systems against adversarial threats. Understanding the mechanisms of adversarial attacks is crucial for developing robust and reliable ML models that can withstand the ever-evolving landscape of adversarial examples.\n\n\nImpact on ML Systems\nAdversarial attacks on machine learning systems have emerged as a significant concern in recent years, highlighting the potential vulnerabilities and risks associated with the widespread adoption of ML technologies. These attacks involve carefully crafted perturbations to input data that can deceive or mislead ML models, leading to incorrect predictions or misclassifications, as shown in Figure 17.20. The impact of adversarial attacks on ML systems is far-reaching and can have serious consequences in various domains.\nOne striking example of the impact of adversarial attacks was demonstrated by researchers in 2017. They experimented with small black and white stickers on stop signs (Eykholt et al. 2017). To the human eye, these stickers did not obscure the sign or prevent its interpretability. However, when images of the sticker-modified stop signs were fed into standard traffic sign classification ML models, a shocking result emerged. The models misclassified the stop signs as speed limit signs over 85% of the time.\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017. “Robust Physical-World Attacks on Deep Learning Models.” ArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\nThis demonstration shed light on the alarming potential of simple adversarial stickers to trick ML systems into misreading critical road signs. The implications of such attacks in the real world are significant, particularly in the context of autonomous vehicles. If deployed on actual roads, these adversarial stickers could cause self-driving cars to misinterpret stop signs as speed limits, leading to dangerous situations, as shown in Figure 17.21. Researchers warned that this could result in rolling stops or unintended acceleration into intersections, endangering public safety.\n\n\n\n\n\n\nFigure 17.20: Adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet (Credit: Goodfellow)\n\n\n\n\n\n\n\n\n\nFigure 17.21: Graffiti on a stop sign tricked a self-driving car into thinking it was a 45 mph speed limit sign (Credit: Eykholt)\n\n\n\nThe case study of the adversarial stickers on stop signs provides a concrete illustration of how adversarial examples exploit how ML models recognize patterns. By subtly manipulating the input data in ways that are invisible to humans, attackers can induce incorrect predictions and create serious risks, especially in safety-critical applications like autonomous vehicles. The attack’s simplicity highlights the vulnerability of ML models to even minor changes in the input, emphasizing the need for robust defenses against such threats.\nThe impact of adversarial attacks extends beyond the degradation of model performance. These attacks raise significant security and safety concerns, particularly in domains where ML models are relied upon for critical decision-making. In healthcare applications, adversarial attacks on medical imaging models could lead to misdiagnosis or incorrect treatment recommendations, jeopardizing patient well-being (M.-J. Tsai, Lin, and Lee 2023). In financial systems, adversarial attacks could enable fraud or manipulation of trading algorithms, resulting in substantial economic losses.\n\nTsai, Min-Jen, Ping-Yi Lin, and Ming-En Lee. 2023. “Adversarial Attacks on Medical Image Classification.” Cancers 15 (17): 4228. https://doi.org/10.3390/cancers15174228.\n\nFursov, Ivan, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, and Evgeny Burnaev. 2021. “Adversarial Attacks on Deep Models for Financial Transaction Records.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining, 2868–78. ACM. https://doi.org/10.1145/3447548.3467145.\nMoreover, adversarial vulnerabilities undermine the trustworthiness and interpretability of ML models. If carefully crafted perturbations can easily fool models, confidence in their predictions and decisions erodes. Adversarial examples expose the models’ reliance on superficial patterns and the inability to capture the true underlying concepts, challenging the reliability of ML systems (Fursov et al. 2021).\nDefending against adversarial attacks often requires additional computational resources and can impact the overall system performance. Techniques like adversarial training, where models are trained on adversarial examples to improve robustness, can significantly increase training time and computational requirements (Bai et al. 2021). Runtime detection and mitigation mechanisms, such as input preprocessing (Addepalli et al. 2020) or prediction consistency checks, introduce latency and affect the real-time performance of ML systems.\n\nBai, Tao, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021. “Recent Advances in Adversarial Training for Adversarial Robustness.” arXiv Preprint arXiv:2102.01356.\n\nAddepalli, Sravanti, B. S. Vivek, Arya Baburaj, Gaurang Sriramanan, and R. Venkatesh Babu. 2020. “Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1020–29. IEEE. https://doi.org/10.1109/cvpr42600.2020.00110.\nThe presence of adversarial vulnerabilities also complicates the deployment and maintenance of ML systems. System designers and operators must consider the potential for adversarial attacks and incorporate appropriate defenses and monitoring mechanisms. Regular updates and retraining of models become necessary to adapt to new adversarial techniques and maintain system security and performance over time.\nThe impact of adversarial attacks on ML systems is significant and multifaceted. These attacks expose ML models’ vulnerabilities, from degrading model performance and raising security and safety concerns to challenging model trustworthiness and interpretability. Developers and researchers must prioritize the development of robust defenses and countermeasures to mitigate the risks posed by adversarial attacks. By addressing these challenges, we can build more secure, reliable, and trustworthy ML systems that can withstand the ever-evolving landscape of adversarial threats.\n\n\n\n\n\n\nExercise 17.2: Adversarial Attacks\n\n\n\n\n\nGet ready to become an AI adversary! In this Colab, you’ll become a white-box hacker, learning to craft attacks that deceive image classification models. We’ll focus on the Fast Gradient Sign Method (FGSM), where you’ll weaponize a model’s gradients against it! You’ll deliberately distort images with tiny perturbations, observing how they increasingly fool the AI more intensely. This hands-on exercise highlights the importance of building secure AI – a critical skill as AI integrates into cars and healthcare. The Colab directly ties into the Robust AI chapter of your book, moving adversarial attacks from theory into your own hands-on experience.\n\nThink you can outsmart an AI? In this Colab, learn how to trick image classification models with adversarial attacks. We’ll use methods like FGSM to change images and subtly fool the AI. Discover how to design deceptive image patches and witness the surprising vulnerability of these powerful models. This is crucial knowledge for building truly robust AI systems!\n\n\n\n\n\n\n\n17.4.2 Data Poisoning\n\nDefinition and Characteristics\nData poisoning is an attack where the training data is tampered with, leading to a compromised model (Biggio, Nelson, and Laskov 2012), as shown in Figure 17.22. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model’s learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012. “Poisoning Attacks Against Support Vector Machines.” In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\n\n\n\n\n\n\nFigure 17.22: NightShade’s poisoning effects on Stable Diffusion (Credit: TOMÉ)\n\n\n\nThe process usually involves the following steps:\n\nInjection: The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.\nTraining: The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.\nDeployment: Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.\n\nThe impact of data poisoning extends beyond classification errors or accuracy drops. In critical applications like healthcare, such alterations can lead to significant trust and safety issues (Marulli, Marrone, and Verde 2022). Later, we will discuss a few case studies of these issues.\n\nMarulli, Fiammetta, Stefano Marrone, and Laura Verde. 2022. “Sensitivity of Machine Learning Approaches to Fake and Untrusted Data in Healthcare Domain.” Journal of Sensor and Actuator Networks 11 (2): 21. https://doi.org/10.3390/jsan11020021.\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022. “Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?” Computer 55 (11): 94–99. https://doi.org/10.1109/mc.2022.3190787.\nThere are six main categories of data poisoning (Oprea, Singhal, and Vassilev 2022):\n\nAvailability Attacks: These attacks aim to compromise the overall functionality of a model. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.\nTargeted Attacks: In contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy for the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model’s classes, making detecting these attacks more challenging.\nBackdoor Attacks: In these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor (a malicious, hidden trigger or pattern) into the training data, such as manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.\nSubpopulation Attacks: Attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:\nScope: While targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a ‘speed bump’ warning sign (with carefully crafted perturbations or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person’s speech is an example of a subpopulation attack.\nKnowledge: While targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.\n\nThe characteristics of data poisoning include:\nSubtle and hard-to-detect manipulations of training data: Data poisoning often involves subtle manipulations of the training data that are carefully crafted to be difficult to detect through casual inspection. Attackers employ sophisticated techniques to ensure that the poisoned samples blend seamlessly with the legitimate data, making them easier to identify with thorough analysis. These manipulations can target specific features or attributes of the data, such as altering numerical values, modifying categorical labels, or introducing carefully designed patterns. The goal is to influence the model’s learning process while evading detection, allowing the poisoned data to subtly corrupt the model’s behavior.\nCan be performed by insiders or external attackers: Data poisoning attacks can be carried out by various actors, including malicious insiders with access to the training data and external attackers who find ways to influence the data collection or preprocessing pipeline. Insiders pose a significant threat because they often have privileged access and knowledge of the system, enabling them to introduce poisoned data without raising suspicions. On the other hand, external attackers may exploit vulnerabilities in data sourcing, crowdsourcing platforms, or data aggregation processes to inject poisoned samples into the training dataset. This highlights the importance of implementing strong access controls, data governance policies, and monitoring mechanisms to mitigate the risk of insider threats and external attacks.\nExploits vulnerabilities in data collection and preprocessing: Data poisoning attacks often exploit vulnerabilities in the machine learning pipeline’s data collection and preprocessing stages. Attackers carefully design poisoned samples to evade common data validation techniques, ensuring that the manipulated data still falls within acceptable ranges, follows expected distributions, or maintains consistency with other features. This allows the poisoned data to pass through data preprocessing steps without detection. Furthermore, poisoning attacks can take advantage of weaknesses in data preprocessing, such as inadequate data cleaning, insufficient outlier detection, or lack of integrity checks. Attackers may also exploit the lack of robust data provenance and lineage tracking mechanisms to introduce poisoned data without leaving a traceable trail. Addressing these vulnerabilities requires rigorous data validation, anomaly detection, and data provenance tracking techniques to ensure the integrity and trustworthiness of the training data.\nDisrupts the learning process and skews model behavior: Data poisoning attacks are designed to disrupt the learning process of machine learning models and skew their behavior towards the attacker’s objectives. The poisoned data is typically manipulated with specific goals, such as skewing the model’s behavior towards certain classes, introducing backdoors, or degrading overall performance. These manipulations are not random but targeted to achieve the attacker’s desired outcomes. By introducing label inconsistencies, where the manipulated samples have labels that do not align with their true nature, poisoning attacks can confuse the model during training and lead to biased or incorrect predictions. The disruption caused by poisoned data can have far-reaching consequences, as the compromised model may make flawed decisions or exhibit unintended behavior when deployed in real-world applications.\nImpacts model performance, fairness, and trustworthiness: Poisoned data in the training dataset can have severe implications for machine learning models’ performance, fairness, and trustworthiness. Poisoned data can degrade the accuracy and performance of the trained model, leading to increased misclassifications or errors in predictions. This can have significant consequences, especially in critical applications where the model’s outputs inform important decisions. Moreover, poisoning attacks can introduce biases and fairness issues, causing the model to make discriminatory or unfair decisions for certain subgroups or classes. This undermines machine learning systems’ ethical and social responsibilities and can perpetuate or amplify existing biases. Furthermore, poisoned data erodes the trustworthiness and reliability of the entire ML system. The model’s outputs become questionable and potentially harmful, leading to a loss of confidence in the system’s integrity. The impact of poisoned data can propagate throughout the entire ML pipeline, affecting downstream components and decisions that rely on the compromised model. Addressing these concerns requires robust data governance, regular model auditing, and ongoing monitoring to detect and mitigate the effects of data poisoning attacks.\n\n\nMechanisms of Data Poisoning\nData poisoning attacks can be carried out through various mechanisms, exploiting different ML pipeline vulnerabilities. These mechanisms allow attackers to manipulate the training data and introduce malicious samples that can compromise the model’s performance, fairness, or integrity. Understanding these mechanisms is crucial for developing effective defenses against data poisoning and ensuring the robustness of ML systems. Data poisoning mechanisms can be broadly categorized based on the attacker’s approach and the stage of the ML pipeline they target. Some common mechanisms include modifying training data labels, altering feature values, injecting carefully crafted malicious samples, exploiting data collection and preprocessing vulnerabilities, manipulating data at the source, poisoning data in online learning scenarios, and collaborating with insiders to manipulate data.\nEach of these mechanisms presents unique challenges and requires different mitigation strategies. For example, detecting label manipulation may involve analyzing the distribution of labels and identifying anomalies (Zhou et al. 2018), while preventing feature manipulation may require secure data preprocessing and anomaly detection techniques (Carta et al. 2020). Defending against insider threats may involve strict access control policies and monitoring of data access patterns. Moreover, the effectiveness of data poisoning attacks often depends on the attacker’s knowledge of the ML system, including the model architecture, training algorithms, and data distribution. Attackers may use adversarial machine learning or data synthesis techniques to craft samples that are more likely to bypass detection and achieve their malicious objectives.\n\nZhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018. “Learning Rich Features for Image Manipulation Detection.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1053–61. IEEE. https://doi.org/10.1109/cvpr.2018.00116.\n\nCarta, Salvatore, Alessandro Sebastian Podda, Diego Reforgiato Recupero, and Roberto Saia. 2020. “A Local Feature Engineering Strategy to Improve Network Anomaly Detection.” Future Internet 12 (10): 177. https://doi.org/10.3390/fi12100177.\n\n\n\n\n\n\nFigure 17.23: Garbage In – Garbage Out (Credit: Information Matters)\n\n\n\nModifying training data labels: One of the most straightforward mechanisms of data poisoning is modifying the training data labels. In this approach, the attacker selectively changes the labels of a subset of the training samples to mislead the model’s learning process as shown in Figure 17.23. For example, in a binary classification task, the attacker might flip the labels of some positive samples to negative, or vice versa. By introducing such label noise, the attacker aims to degrade the model’s performance or cause it to make incorrect predictions for specific target instances.\nAltering feature values in training data: Another mechanism of data poisoning involves altering the feature values of the training samples without modifying the labels. The attacker carefully crafts the feature values to introduce specific biases or vulnerabilities into the model. For instance, in an image classification task, the attacker might add imperceptible perturbations to a subset of images, causing the model to learn a particular pattern or association. This type of poisoning can create backdoors or trojans in the trained model, which specific input patterns can trigger.\nInjecting carefully crafted malicious samples: In this mechanism, the attacker creates malicious samples designed to poison the model. These samples are crafted to have a specific impact on the model’s behavior while blending in with the legitimate training data. The attacker might use techniques such as adversarial perturbations or data synthesis to generate poisoned samples that are difficult to detect. The attacker aims to manipulate the model’s decision boundaries by injecting these malicious samples into the training data or introducing targeted misclassifications.\nExploiting data collection and preprocessing vulnerabilities: Data poisoning attacks can also exploit the data collection and preprocessing pipeline vulnerabilities. If the data collection process is not secure or there are weaknesses in the data preprocessing steps, an attacker can manipulate the data before it reaches the training phase. For example, if data is collected from untrusted sources or issues in data cleaning or aggregation, an attacker can introduce poisoned samples or manipulate the data to their advantage.\nManipulating data at the source (e.g., sensor data): In some cases, attackers can manipulate the data at its source, such as sensor data or input devices. By tampering with the sensors or manipulating the environment in which data is collected, attackers can introduce poisoned samples or bias the data distribution. For instance, in a self-driving car scenario, an attacker might manipulate the sensors or the environment to feed misleading information into the training data, compromising the model’s ability to make safe and reliable decisions.\n\n\n\n\n\n\nFigure 17.24: Data Poisoning Attack (Credit: Sikandar)\n\n\n\nPoisoning data in online learning scenarios: Data poisoning attacks can also target ML systems that employ online learning, where the model is continuously updated with new data in real time. In such scenarios, an attacker can gradually inject poisoned samples over time, slowly manipulating the model’s behavior. Online learning systems are particularly vulnerable to data poisoning because they adapt to new data without extensive validation, making it easier for attackers to introduce malicious samples, as shown in Figure 17.24.\nCollaborating with insiders to manipulate data: Sometimes, data poisoning attacks can involve collaboration with insiders with access to the training data. Malicious insiders, such as employees or data providers, can manipulate the data before it is used to train the model. Insider threats are particularly challenging to detect and prevent, as the attackers have legitimate access to the data and can carefully craft the poisoning strategy to evade detection.\nThese are the key mechanisms of data poisoning in ML systems. Attackers often employ these mechanisms to make their attacks more effective and harder to detect. The risk of data poisoning attacks grows as ML systems become increasingly complex and rely on larger datasets from diverse sources. Defending against data poisoning requires a multifaceted approach. ML practitioners and system designers must be aware of the various mechanisms of data poisoning and adopt a comprehensive approach to data security and model resilience. This includes secure data collection, robust data validation, and continuous model performance monitoring. Implementing secure data collection and preprocessing practices is crucial to prevent data poisoning at the source. Data validation and anomaly detection techniques can also help identify and mitigate potential poisoning attempts. Monitoring model performance for signs of data poisoning is also essential to detect and respond to attacks promptly.\n\n\nImpact on ML Systems\nData poisoning attacks can severely affect ML systems, compromising their performance, reliability, and trustworthiness. The impact of data poisoning can manifest in various ways, depending on the attacker’s objectives and the specific mechanism used. Let’s explore each of the potential impacts in detail.\nDegradation of model performance: One of the primary impacts of data poisoning is the degradation of the model’s overall performance. By manipulating the training data, attackers can introduce noise, biases, or inconsistencies that hinder the model’s ability to learn accurate patterns and make reliable predictions. This can reduce accuracy, precision, recall, or other performance metrics. The degradation of model performance can have significant consequences, especially in critical applications such as healthcare, finance, or security, where the reliability of predictions is crucial.\nMisclassification of specific targets: Data poisoning attacks can also be designed to cause the model to misclassify specific target instances. Attackers may introduce carefully crafted poisoned samples similar to the target instances, leading the model to learn incorrect associations. This can result in the model consistently misclassifying the targeted instances, even if it performs well on other inputs. Such targeted misclassification can have severe consequences, such as causing a malware detection system to overlook specific malicious files or leading to the wrong diagnosis in a medical imaging application.\nBackdoors and trojans in trained models: Data poisoning can introduce backdoors or trojans into the trained model. Backdoors are hidden functionalities that allow attackers to trigger specific behaviors or bypass normal authentication mechanisms. On the other hand, Trojans are malicious components embedded within the model that can activate specific input patterns. By poisoning the training data, attackers can create models that appear to perform normally but contain hidden vulnerabilities that can be exploited later. Backdoors and trojans can compromise the integrity and security of the ML system, allowing attackers to gain unauthorized access, manipulate predictions, or exfiltrate sensitive information.\nBiased or unfair model outcomes: Data poisoning attacks can introduce biases or unfairness into the model’s predictions. By manipulating the training data distribution or injecting samples with specific biases, attackers can cause the model to learn and perpetuate discriminatory patterns. This can lead to unfair treatment of certain groups or individuals based on sensitive attributes such as race, gender, or age. Biased models can have severe societal implications, reinforcing existing inequalities and discriminatory practices. Ensuring fairness and mitigating biases is crucial for building trustworthy and ethical ML systems.\nIncreased false positives or false negatives: Data poisoning can also impact the model’s ability to correctly identify positive or negative instances, leading to increased false positives or false negatives. False positives occur when the model incorrectly identifies a negative instance as positive, while false negatives happen when a positive instance is misclassified as negative. The consequences of increased false positives or false negatives can be significant depending on the application. For example, in a fraud detection system, high false positives can lead to unnecessary investigations and customer frustration, while high false negatives can allow fraudulent activities to go undetected.\nCompromised system reliability and trustworthiness: Data poisoning attacks can undermine ML systems’ overall reliability and trustworthiness. When models are trained on poisoned data, their predictions become reliable and trustworthy. This can erode user confidence in the system and lead to a loss of trust in the decisions made by the model. In critical applications where ML systems are relied upon for decision-making, such as autonomous vehicles or medical diagnosis, compromised reliability can have severe consequences, putting lives and property at risk.\nAddressing the impact of data poisoning requires a proactive approach to data security, model testing, and monitoring. Organizations must implement robust measures to ensure the integrity and quality of training data, employ techniques to detect and mitigate poisoning attempts, and continuously monitor the performance and behavior of deployed models. Collaboration between ML practitioners, security experts, and domain specialists is essential to develop comprehensive strategies for preventing and responding to data poisoning attacks.\n\nCase Study 1\nIn 2017, researchers demonstrated a data poisoning attack against a popular toxicity classification model called Perspective (Hosseini et al. 2017). This ML model detects toxic comments online.\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. “Deceiving Google’s Perspective Api Built for Detecting Toxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\nThe researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model’s training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time.\nAfter retraining on the poisoned data, the model’s false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems.\nThis case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.\n\n\nCase Study 2\n\n\n\n\n\n\nFigure 17.25: Samples of dirty-label poison data regarding mismatched text/image pairs (Credit: Shan)\n\n\n\nInterestingly enough, data poisoning attacks are not always malicious (Shan et al. 2023). Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online, as shown in Figure 17.25.\nWhile these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative AI models when incorporated into the training data. Generative models can be manipulated to generate hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers could trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars.\nAs the number of poisoned images on the internet increases, the performance of the models that use scraped data will deteriorate exponentially. First, the poisoned data is hard to detect and requires manual elimination. Second, the \"poison\" spreads quickly to other labels because generative models rely on connections between words and concepts as they generate images. So a poisoned image of a \"car\" could spread into generated images associated with words like \"truck,” \"train,” \" bus,” etc.\nOn the other hand, this tool can be used maliciously and can affect legitimate applications of the generative models. This shows the very challenging and novel nature of machine learning attacks.\nFigure 17.26 demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.\n\n\n\n\n\n\nFigure 17.26: Data poisoning (Credit: Shan et al. (2023))\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y Zhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\n\n\n\n\n\n\nExercise 17.3: Poisoning Attacks\n\n\n\n\n\nGet ready to explore the dark side of AI security! In this Colab, you’ll learn about data poisoning – how bad data can trick AI models into making wrong decisions. We’ll focus on a real-world attack against a Support Vector Machine (SVM), observing how the AI’s behavior changes under attack. This hands-on exercise will highlight why protecting AI systems is crucial, especially as they become more integrated into our lives. Think like a hacker, understand the vulnerability, and brainstorm how to defend our AI systems!\n\n\n\n\n\n\n\n\n17.4.3 Distribution Shifts\n\nDefinition and Characteristics\nDistribution shift refers to the phenomenon where the data distribution encountered by an ML model during deployment (inference) differs from the distribution it was trained on, as shown in Figure 17.27. This is not so much an attack as it is that the model’s robustness will vary over time. In other words, the data’s statistical properties, patterns, or underlying assumptions can change between the training and test phases.\n\n\n\n\n\n\nFigure 17.27: The curly brackets enclose the distribution shift between the environments. Here, z stands for the spurious feature, and y stands for label class (Credit: Xin)\n\n\n\nThe key characteristics of distribution shift include:\nDomain mismatch: The input data during inference comes from a different domain or distribution than the training data. When the input data during inference comes from a domain or distribution different from the training data, it can significantly affect the model’s performance. This is because the model has learned patterns and relationships specific to the training domain, and when applied to a different domain, those learned patterns may not hold. For example, consider a sentiment analysis model trained on movie reviews. Suppose this model is applied to analyze sentiment in tweets. In that case, it may need help to accurately classify the sentiment because the language, grammar, and context of tweets can differ from movie reviews. This domain mismatch can result in poor performance and unreliable predictions, limiting the model’s practical utility.\nTemporal drift: The data distribution evolves, leading to a gradual or sudden shift in the input characteristics. Temporal drift is important because ML models are often deployed in dynamic environments where the data distribution can change over time. If the model is not updated or adapted to these changes, its performance can gradually degrade. For instance, the patterns and behaviors associated with fraudulent activities may evolve in a fraud detection system as fraudsters adapt their techniques. If the model is not retrained or updated to capture these new patterns, it may fail to detect new types of fraud effectively. Temporal drift can lead to a decline in the model’s accuracy and reliability over time, making monitoring and addressing this type of distribution shift crucial.\nContextual changes: The ML model’s context can vary, resulting in different data distributions based on factors such as location, user behavior, or environmental conditions. Contextual changes matter because ML models are often deployed in various contexts or environments that can have different data distributions. If the model cannot generalize well to these different contexts, its performance may improve. For example, consider a computer vision model trained to recognize objects in a controlled lab environment. When deployed in a real-world setting, factors such as lighting conditions, camera angles, or background clutter can vary significantly, leading to a distribution shift. If the model is robust to these contextual changes, it may be able to accurately recognize objects in the new environment, limiting its practical utility.\nUnrepresentative training data: The training data may only partially capture the variability and diversity of the real-world data encountered during deployment. Unrepresentative training data can lead to biased or skewed models that perform poorly on real-world data. Suppose the training data needs to capture the variability and diversity of the real-world data adequately. In that case, the model may learn patterns specific to the training set but needs to generalize better to new, unseen data. This can result in poor performance, biased predictions, and limited model applicability. For instance, if a facial recognition model is trained primarily on images of individuals from a specific demographic group, it may struggle to accurately recognize faces from other demographic groups when deployed in a real-world setting. Ensuring that the training data is representative and diverse is crucial for building models that can generalize well to real-world scenarios.\n\n\n\n\n\n\nFigure 17.28: Concept drift refers to a change in data patterns and relationships over time (Credit: Evidently AI)\n\n\n\nDistribution shift can manifest in various forms, such as:\nCovariate shift: The distribution of the input features (covariates) changes while the conditional distribution of the target variable given the input remains the same. Covariate shift matters because it can impact the model’s ability to make accurate predictions when the input features (covariates) differ between the training and test data. Even if the relationship between the input features and the target variable remains the same, a change in the distribution of the input features can affect the model’s performance. For example, consider a model trained to predict housing prices based on features like square footage, number of bedrooms, and location. Suppose the distribution of these features in the test data significantly differs from the training data (e.g., the test data contains houses with much larger square footage). In that case, the model’s predictions may become less accurate. Addressing covariate shifts is important to ensure the model’s robustness and reliability when applied to new data.\nConcept drift: The relationship between the input features and the target variable changes over time, altering the underlying concept the model is trying to learn, as shown in Figure 17.28. Concept drift is important because it indicates changes in the fundamental relationship between the input features and the target variable over time. When the underlying concept that the model is trying to learn shifts, its performance can deteriorate if not adapted to the new concept. For instance, in a customer churn prediction model, the factors influencing customer churn may evolve due to market conditions, competitor offerings, or customer preferences. If the model is not updated to capture these changes, its predictions may become less accurate and irrelevant. Detecting and adapting to concept drift is crucial to maintaining the model’s effectiveness and alignment with evolving real-world concepts.\nDomain generalization: The model must generalize to unseen domains or distributions not present during training. Domain generalization is important because it enables ML models to be applied to new, unseen domains without requiring extensive retraining or adaptation. In real-world scenarios, training data that covers all possible domains or distributions that the model may encounter is often infeasible. Domain generalization techniques aim to learn domain-invariant features or models that can generalize well to new domains. For example, consider a model trained to classify images of animals. If the model can learn features invariant to different backgrounds, lighting conditions, or poses, it can generalize well to classify animals in new, unseen environments. Domain generalization is crucial for building models that can be deployed in diverse and evolving real-world settings.\nThe presence of a distribution shift can significantly impact the performance and reliability of ML models, as the models may need help generalizing well to the new data distribution. Detecting and adapting to distribution shifts is crucial to ensure ML systems’ robustness and practical utility in real-world scenarios.\n\n\nMechanisms of Distribution Shifts\nThe mechanisms of distribution shift, such as changes in data sources, temporal evolution, domain-specific variations, selection bias, feedback loops, and adversarial manipulations, are important to understand because they help identify the underlying causes of distribution shift. By understanding these mechanisms, practitioners can develop targeted strategies to mitigate their impact and improve the model’s robustness. Here are some common mechanisms:\n\n\n\n\n\n\nFigure 17.29: Temporal evolution (Credit: Białek)\n\n\n\nChanges in data sources: Distribution shifts can occur when the data sources used for training and inference differ. For example, if a model is trained on data from one sensor but deployed on data from another sensor with different characteristics, it can lead to a distribution shift.\nTemporal evolution: Over time, the underlying data distribution can evolve due to changes in user behavior, market dynamics, or other temporal factors. For instance, in a recommendation system, user preferences may shift over time, leading to a distribution shift in the input data, as shown in Figure 17.29.\nDomain-specific variations: Different domains or contexts can have distinct data distributions. A model trained on data from one domain may only generalize well to another domain with appropriate adaptation techniques. For example, an image classification model trained on indoor scenes may struggle when applied to outdoor scenes.\nSelection bias: A Distribution shift can arise from selection bias during data collection or sampling. If the training data does not represent the true population or certain subgroups are over- or underrepresented, this can lead to a mismatch between the training and test distributions.\nFeedback loops: In some cases, the predictions or actions taken by an ML model can influence future data distribution. For example, in a dynamic pricing system, the prices set by the model can impact customer behavior, leading to a shift in the data distribution over time.\nAdversarial manipulations: Adversaries can intentionally manipulate the input data to create a distribution shift and deceive the ML model. By introducing carefully crafted perturbations or generating out-of-distribution samples, attackers can exploit the model’s vulnerabilities and cause it to make incorrect predictions.\nUnderstanding the mechanisms of distribution shift is important for developing effective strategies to detect and mitigate its impact on ML systems. By identifying the sources and characteristics of the shift, practitioners can design appropriate techniques, such as domain adaptation, transfer learning, or continual learning, to improve the model’s robustness and performance under distributional changes.\n\n\nImpact on ML Systems\nDistribution shifts can significantly negatively impact the performance and reliability of ML systems. Here are some key ways in which distribution shift can affect ML models:\nDegraded predictive performance: When the data distribution encountered during inference differs from the training distribution, the model’s predictive accuracy can deteriorate. The model may need help generalizing the new data well, leading to increased errors and suboptimal performance.\nReduced reliability and trustworthiness: Distribution shift can undermine the reliability and trustworthiness of ML models. If the model’s predictions become unreliable or inconsistent due to the shift, users may lose confidence in the system’s outputs, leading to potential misuse or disuse of the model.\nBiased predictions: Distribution shift can introduce biases in the model’s predictions. If the training data does not represent the real-world distribution or certain subgroups are underrepresented, the model may make biased predictions that discriminate against certain groups or perpetuate societal biases.\nIncreased uncertainty and risk: Distribution shift introduces additional uncertainty and risk into the ML system. The model’s behavior and performance may become less predictable, making it challenging to assess its reliability and suitability for critical applications. This uncertainty can lead to increased operational risks and potential failures.\nAdaptability challenges: ML models trained on a specific data distribution may need help to adapt to changing environments or new domains. The lack of adaptability can limit the model’s usefulness and applicability in dynamic real-world scenarios where the data distribution evolves.\nMaintenance and update difficulties: Distribution shift can complicate the maintenance and updating of ML models. As the data distribution changes, the model may require frequent retraining or fine-tuning to maintain its performance. This can be time-consuming and resource-intensive, especially if the shift occurs rapidly or continuously.\nVulnerability to adversarial attacks: Distribution shift can make ML models more vulnerable to adversarial attacks. Adversaries can exploit the model’s sensitivity to distributional changes by crafting adversarial examples outside the training distribution, causing the model to make incorrect predictions or behave unexpectedly.\nTo mitigate the impact of distribution shifts, it is crucial to develop robust ML systems that detect and adapt to distributional changes. Techniques such as domain adaptation, transfer learning, and continual learning can help improve the model’s generalization ability across different distributions. ML model monitoring, testing, and updating are also necessary to ensure their performance and reliability during distribution shifts.\n\n\n\n17.4.4 Detection and Mitigation\n\nAdversarial Attacks\nAs you may recall from above, adversarial attacks pose a significant threat to the robustness and reliability of ML systems. These attacks involve crafting carefully designed inputs, known as adversarial examples, to deceive ML models and cause them to make incorrect predictions. To safeguard ML systems against adversarial attacks, developing effective techniques for detecting and mitigating these threats is crucial.\n\nAdversarial Example Detection Techniques\nDetecting adversarial examples is the first line of defense against adversarial attacks. Several techniques have been proposed to identify and flag suspicious inputs that may be adversarial.\nStatistical methods aim to detect adversarial examples by analyzing the statistical properties of the input data. These methods often compare the input data distribution to a reference distribution, such as the training data distribution or a known benign distribution. Techniques like the Kolmogorov-Smirnov (Berger and Zhou 2014) test or the Anderson-Darling test can be used to measure the discrepancy between the distributions and flag inputs that deviate significantly from the expected distribution.\n\nBerger, Vance W, and YanYan Zhou. 2014. “Kolmogorovsmirnov Test: Overview.” Wiley Statsref: Statistics Reference Online.\nKernel density estimation (KDE) is a non-parametric technique used to estimate the probability density function of a dataset. In the context of adversarial example detection, KDE can be used to estimate the density of benign examples in the input space. Adversarial examples often lie in low-density regions and can be detected by comparing their estimated density to a threshold. Inputs with an estimated density below the threshold are flagged as potential adversarial examples.\nAnother technique is feature squeezing (Panda, Chakraborty, and Roy 2019), which reduces the complexity of the input space by applying dimensionality reduction or discretization. The idea behind feature squeezing is that adversarial examples often rely on small, imperceptible perturbations that can be eliminated or reduced through these transformations. Inconsistencies can be detected by comparing the model’s predictions on the original input and the squeezed input, indicating the presence of adversarial examples.\n\nPanda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy. 2019. “Discretization Based Solutions for Secure Machine Learning Against Adversarial Attacks.” #IEEE_O_ACC# 7: 70157–68. https://doi.org/10.1109/access.2019.2919463.\nModel uncertainty estimation techniques aim to quantify the confidence or uncertainty associated with a model’s predictions. Adversarial examples often exploit regions of high uncertainty in the model’s decision boundary. By estimating the uncertainty using techniques like Bayesian neural networks, dropout-based uncertainty estimation, or ensemble methods, inputs with high uncertainty can be flagged as potential adversarial examples.\n\n\nAdversarial Defense Strategies\nOnce adversarial examples are detected, various defense strategies can be employed to mitigate their impact and improve the robustness of ML models.\nAdversarial training is a technique that involves augmenting the training data with adversarial examples and retraining the model on this augmented dataset. Exposing the model to adversarial examples during training teaches it to classify them correctly and becomes more robust to adversarial attacks. Adversarial training can be performed using various attack methods, such as the Fast Gradient Sign Method (FGSM) or Projected Gradient Descent (PGD) (Madry et al. 2017).\n\nMadry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to Adversarial Attacks.” arXiv Preprint arXiv:1706.06083.\n\nPapernot, Nicolas, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. “Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks.” In 2016 IEEE Symposium on Security and Privacy (SP), 582–97. IEEE; IEEE. https://doi.org/10.1109/sp.2016.41.\nDefensive distillation (Papernot et al. 2016) is a technique that trains a second model (the student model) to mimic the behavior of the original model (the teacher model). The student model is trained on the soft labels produced by the teacher model, which are less sensitive to small perturbations. Using the student model for inference can reduce the impact of adversarial perturbations, as the student model learns to generalize better and is less sensitive to adversarial noise.\nInput preprocessing and transformation techniques aim to remove or mitigate the effect of adversarial perturbations before feeding the input to the ML model. These techniques include image denoising, JPEG compression, random resizing, padding, or applying random transformations to the input data. By reducing the impact of adversarial perturbations, these preprocessing steps can help improve the model’s robustness to adversarial attacks.\nEnsemble methods combine multiple models to make more robust predictions. The ensemble can reduce the impact of adversarial attacks by using a diverse set of models with different architectures, training data, or hyperparameters. Adversarial examples that fool one model may not fool others in the ensemble, leading to more reliable and robust predictions. Model diversification techniques, such as using different preprocessing techniques or feature representations for each model in the ensemble, can further enhance the robustness.\n\n\nRobustness Evaluation and Testing\nConduct thorough evaluation and testing to assess the effectiveness of adversarial defense techniques and measure the robustness of ML models.\nAdversarial robustness metrics quantify the model’s resilience to adversarial attacks. These metrics can include the model’s accuracy on adversarial examples, the average distortion required to fool the model, or the model’s performance under different attack strengths. By comparing these metrics across different models or defense techniques, practitioners can assess and compare their robustness levels.\nStandardized adversarial attack benchmarks and datasets provide a common ground for evaluating and comparing the robustness of ML models. These benchmarks include datasets with pre-generated adversarial examples and tools and frameworks for generating adversarial attacks. Examples of popular adversarial attack benchmarks include the MNIST-C, CIFAR-10-C, and ImageNet-C (Hendrycks and Dietterich 2019) datasets, which contain corrupted or perturbed versions of the original datasets.\n\nHendrycks, Dan, and Thomas Dietterich. 2019. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.” arXiv Preprint arXiv:1903.12261.\nPractitioners can develop more robust and resilient ML systems by leveraging these adversarial example detection techniques, defense strategies, and robustness evaluation methods. However, it is important to note that adversarial robustness is an ongoing research area, and no single technique provides complete protection against all types of adversarial attacks. A comprehensive approach that combines multiple defense mechanisms and regular testing is essential to maintain the security and reliability of ML systems in the face of evolving adversarial threats.\n\n\n\nData Poisoning\nRecall that data poisoning is an attack that targets the integrity of the training data used to build ML models. By manipulating or corrupting the training data, attackers can influence the model’s behavior and cause it to make incorrect predictions or perform unintended actions. Detecting and mitigating data poisoning attacks is crucial to ensure the trustworthiness and reliability of ML systems, as shown in Figure 17.30.\n\nAnomaly Detection Techniques for Identifying Poisoned Data\n\n\n\n\n\n\nFigure 17.30: Malicious data injection (Credit: Li)\n\n\n\nStatistical outlier detection methods identify data points that deviate significantly from most data. These methods assume that poisoned data instances are likely to be statistical outliers. Techniques such as the Z-score method, Tukey’s method, or the [Mahalanobis] distance can be used to measure the deviation of each data point from the central tendency of the dataset. Data points that exceed a predefined threshold are flagged as potential outliers and considered suspicious for data poisoning.\nClustering-based methods group similar data points together based on their features or attributes. The assumption is that poisoned data instances may form distinct clusters or lie far away from the normal data clusters. By applying clustering algorithms like K-means, DBSCAN, or hierarchical clustering, anomalous clusters or data points that do not belong to any cluster can be identified. These anomalous instances are then treated as potentially poisoned data.\n\n\n\n\n\n\nFigure 17.31: Autoencoder (Credit: Dertat)\n\n\n\nAutoencoders are neural networks trained to reconstruct the input data from a compressed representation, as shown in Figure 17.31. They can be used for anomaly detection by learning the normal patterns in the data and identifying instances that deviate from them. During training, the autoencoder is trained on clean, unpoisoned data. At inference time, the reconstruction error for each data point is computed. Data points with high reconstruction errors are considered abnormal and potentially poisoned, as they do not conform to the learned normal patterns.\n\n\nData Sanitization and Preprocessing Techniques\nData poisoning can be avoided by cleaning data, which involves identifying and removing or correcting noisy, incomplete, or inconsistent data points. Techniques such as data deduplication, missing value imputation, and outlier removal can be applied to improve the quality of the training data. By eliminating or filtering out suspicious or anomalous data points, the impact of poisoned instances can be reduced.\nData validation involves verifying the integrity and consistency of the training data. This can include checking for data type consistency, range validation, and cross-field dependencies. By defining and enforcing data validation rules, anomalous or inconsistent data points indicative of data poisoning can be identified and flagged for further investigation.\nData provenance and lineage tracking involve maintaining a record of data’s origin, transformations, and movements throughout the ML pipeline. By documenting the data sources, preprocessing steps, and any modifications made to the data, practitioners can trace anomalies or suspicious patterns back to their origin. This helps identify potential points of data poisoning and facilitates the investigation and mitigation process.\n\n\nRobust Training Techniques\nRobust optimization techniques can be used to modify the training objective to minimize the impact of outliers or poisoned instances. This can be achieved by using robust loss functions less sensitive to extreme values, such as the Huber loss or the modified Huber loss. Regularization techniques, such as L1 or L2 regularization, can also help in reducing the model’s sensitivity to poisoned data by constraining the model’s complexity and preventing overfitting.\nRobust loss functions are designed to be less sensitive to outliers or noisy data points. Examples include the modified Huber loss, the Tukey loss (Beaton and Tukey 1974), and the trimmed mean loss. These loss functions down-weight or ignore the contribution of abnormal instances during training, reducing their impact on the model’s learning process. Robust objective functions, such as the minimax or distributionally robust objective, aim to optimize the model’s performance under worst-case scenarios or in the presence of adversarial perturbations.\n\nBeaton, Albert E., and John W. Tukey. 1974. “The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data.” Technometrics 16 (2): 147. https://doi.org/10.2307/1267936.\nData augmentation techniques involve generating additional training examples by applying random transformations or perturbations to the existing data Figure 17.32. This helps in increasing the diversity and robustness of the training dataset. By introducing controlled variations in the data, the model becomes less sensitive to specific patterns or artifacts that may be present in poisoned instances. Randomization techniques, such as random subsampling or bootstrap aggregating, can also help reduce the impact of poisoned data by training multiple models on different subsets of the data and combining their predictions.\n\n\n\n\n\n\nFigure 17.32: An image of the number “3” in original form and with basic augmentations applied.\n\n\n\n\n\nSecure and Trusted Data Sourcing\nImplementing the best data collection and curation practices can help mitigate the risk of data poisoning. This includes establishing clear data collection protocols, verifying the authenticity and reliability of data sources, and conducting regular data quality assessments. Sourcing data from trusted and reputable providers and following secure data handling practices can reduce the likelihood of introducing poisoned data into the training pipeline.\nStrong data governance and access control mechanisms are essential to prevent unauthorized modifications or tampering with the training data. This involves defining clear roles and responsibilities for data access, implementing access control policies based on the principle of least privilege, and monitoring and logging data access activities. By restricting access to the training data and maintaining an audit trail, potential data poisoning attempts can be detected and investigated.\nDetecting and mitigating data poisoning attacks requires a multifaceted approach that combines anomaly detection, data sanitization, robust training techniques, and secure data sourcing practices. By implementing these measures, ML practitioners can enhance the resilience of their models against data poisoning and ensure the integrity and trustworthiness of the training data. However, it is important to note that data poisoning is an active area of research, and new attack vectors and defense mechanisms continue to emerge. Staying informed about the latest developments and adopting a proactive and adaptive approach to data security is crucial for maintaining the robustness of ML systems.\n\n\n\nDistribution Shifts\n\nDetecting and Mitigating Distribution Shifts\nRecall that distribution shifts occur when the data distribution encountered by a machine learning (ML) model during deployment differs from the distribution it was trained on. These shifts can significantly impact the model’s performance and generalization ability, leading to suboptimal or incorrect predictions. Detecting and mitigating distribution shifts is crucial to ensure the robustness and reliability of ML systems in real-world scenarios.\n\n\nDetection Techniques for Distribution Shifts\nStatistical tests can be used to compare the distributions of the training and test data to identify significant differences. Techniques such as the Kolmogorov-Smirnov test or the Anderson-Darling test measure the discrepancy between two distributions and provide a quantitative assessment of the presence of distribution shift. By applying these tests to the input features or the model’s predictions, practitioners can detect if there is a statistically significant difference between the training and test distributions.\nDivergence metrics quantify the dissimilarity between two probability distributions. Commonly used divergence metrics include the Kullback-Leibler (KL) divergence and the [Jensen-Shannon (JS)] divergence. By calculating the divergence between the training and test data distributions, practitioners can assess the extent of the distribution shift. High divergence values indicate a significant difference between the distributions, suggesting the presence of a distribution shift.\nUncertainty quantification techniques, such as Bayesian neural networks or ensemble methods, can estimate the uncertainty associated with the model’s predictions. When a model is applied to data from a different distribution, its predictions may have higher uncertainty. By monitoring the uncertainty levels, practitioners can detect distribution shifts. If the uncertainty consistently exceeds a predetermined threshold for test samples, it suggests that the model is operating outside its trained distribution.\nIn addition, domain classifiers are trained to distinguish between different domains or distributions. Practitioners can detect distribution shifts by training a classifier to differentiate between the training and test domains. If the domain classifier achieves high accuracy in distinguishing between the two domains, it indicates a significant difference in the underlying distributions. The performance of the domain classifier serves as a measure of the distribution shift.\n\n\nMitigation Techniques for Distribution Shifts\n\n\n\n\n\n\nFigure 17.33: Transfer learning (Credit: Bhavsar)\n\n\n\nTransfer learning leverages knowledge gained from one domain to improve performance in another, as shown in Figure 17.33. By using pre-trained models or transferring learned features from a source domain to a target domain, transfer learning can help mitigate the impact of distribution shifts. The pre-trained model can be fine-tuned on a small amount of labeled data from the target domain, allowing it to adapt to the new distribution. Transfer learning is particularly effective when the source and target domains share similar characteristics or when labeled data in the target domain is scarce.\nContinual learning, also known as lifelong learning, enables ML models to learn continuously from new data distributions while retaining knowledge from previous distributions. Techniques such as elastic weight consolidation (EWC) (Kirkpatrick et al. 2017) or gradient episodic memory (GEM) (Lopez-Paz and Ranzato 2017) allow models to adapt to evolving data distributions over time. These techniques aim to balance the plasticity of the model (ability to learn from new data) with the stability of the model (retaining previously learned knowledge). By incrementally updating the model with new data and mitigating catastrophic forgetting, continual learning helps models stay robust to distribution shifts.\n\nKirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al. 2017. “Overcoming Catastrophic Forgetting in Neural Networks.” Proc. Natl. Acad. Sci. 114 (13): 3521–26. https://doi.org/10.1073/pnas.1611835114.\n\nLopez-Paz, David, and Marc’Aurelio Ranzato. 2017. “Gradient Episodic Memory for Continual Learning.” Adv Neural Inf Process Syst 30.\nData augmentation techniques, such as those we have seen previously, involve applying transformations or perturbations to the existing training data to increase its diversity and improve the model’s robustness to distribution shifts. By introducing variations in the data, such as rotations, translations, scaling, or adding noise, data augmentation helps the model learn invariant features and generalize better to unseen distributions. Data augmentation can be performed during training and inference to enhance the model’s ability to handle distribution shifts.\nEnsemble methods combine multiple models to make predictions more robust to distribution shifts. By training models on different subsets of the data, using different algorithms, or with different hyperparameters, ensemble methods can capture diverse aspects of the data distribution. When presented with a shifted distribution, the ensemble can leverage the strengths of individual models to make more accurate and stable predictions. Techniques like bagging, boosting, or stacking can create effective ensembles.\nRegularly updating models with new data from the target distribution is crucial to mitigate the impact of distribution shifts. As the data distribution evolves, models should be retrained or fine-tuned on the latest available data to adapt to the changing patterns. Monitoring model performance and data characteristics can help detect when an update is necessary. By keeping the models up to date, practitioners can ensure they remain relevant and accurate in the face of distribution shifts.\nEvaluating models using robust metrics less sensitive to distribution shifts can provide a more reliable assessment of model performance. Metrics such as the area under the precision-recall curve (AUPRC) or the F1 score are more robust to class imbalance and can better capture the model’s performance across different distributions. Additionally, using domain-specific evaluation metrics that align with the desired outcomes in the target domain can provide a more meaningful measure of the model’s effectiveness.\nDetecting and mitigating distribution shifts is an ongoing process that requires continuous monitoring, adaptation, and improvement. By employing a combination of detection techniques and mitigation strategies, ML practitioners can proactively identify and address distribution shifts, ensuring the robustness and reliability of their models in real-world deployments. It is important to note that distribution shifts can take various forms and may require domain-specific approaches depending on the nature of the data and the application. Staying informed about the latest research and best practices in handling distribution shifts is essential for building resilient ML systems.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1947,7 +1947,7 @@ "section": "17.5 Software Faults", "text": "17.5 Software Faults\n\nDefinition and Characteristics\nSoftware faults refer to defects, errors, or bugs in the runtime software frameworks and components that support the execution and deployment of ML models (Myllyaho et al. 2022). These faults can arise from various sources, such as programming mistakes, design flaws, or compatibility issues (H. Zhang 2008), and can have significant implications for ML systems’ performance, reliability, and security. Software faults in ML frameworks exhibit several key characteristics:\n\nMyllyaho, Lalli, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen, and Tommi Mikkonen. 2022. “On Misbehaviour and Fault Tolerance in Machine Learning Systems.” J. Syst. Software 183 (January): 111096. https://doi.org/10.1016/j.jss.2021.111096.\n\nZhang, Hongyu. 2008. “On the Distribution of Software Faults.” IEEE Trans. Software Eng. 34 (2): 301–2. https://doi.org/10.1109/tse.2007.70771.\n\nDiversity: Software faults can manifest in different forms, ranging from simple logic and syntax mistakes to more complex issues like memory leaks, race conditions, and integration problems. The variety of fault types adds to the challenge of detecting and mitigating them effectively.\nPropagation: In ML systems, software faults can propagate through the various layers and components of the framework. A fault in one module can trigger a cascade of errors or unexpected behavior in other parts of the system, making it difficult to pinpoint the root cause and assess the full impact of the fault.\nIntermittency: Some software faults may exhibit intermittent behavior, occurring sporadically or under specific conditions. These faults can be particularly challenging to reproduce and debug, as they may manifest inconsistently during testing or normal operation.\nInteraction with ML models: Software faults in ML frameworks can interact with the trained models in subtle ways. For example, a fault in the data preprocessing pipeline may introduce noise or bias into the model’s inputs, leading to degraded performance or incorrect predictions. Similarly, faults in the model serving component may cause inconsistencies between the training and inference environments.\nImpact on system properties: Software faults can compromise various desirable properties of ML systems, such as performance, scalability, reliability, and security. Faults may lead to slowdowns, crashes, incorrect outputs, or vulnerabilities that attackers can exploit.\nDependency on external factors: The occurrence and impact of software faults in ML frameworks often depend on external factors, such as the choice of hardware, operating system, libraries, and configurations. Compatibility issues and version mismatches can introduce faults that are difficult to anticipate and mitigate.\n\nUnderstanding the characteristics of software faults in ML frameworks is crucial for developing effective fault prevention, detection, and mitigation strategies. By recognizing the diversity, propagation, intermittency, and impact of software faults, ML practitioners can design more robust and reliable systems resilient to these issues.\n\n\nMechanisms of Software Faults in ML Frameworks\nMachine learning frameworks, such as TensorFlow, PyTorch, and sci-kit-learn, provide powerful tools and abstractions for building and deploying ML models. However, these frameworks are not immune to software faults that can impact ML systems’ performance, reliability, and correctness. Let’s explore some of the common software faults that can occur in ML frameworks:\nMemory Leaks and Resource Management Issues: Improper memory management, such as failing to release memory or close file handles, can lead to memory leaks and resource exhaustion over time. This issue is compounded by inefficient memory usage, where creating unnecessary copies of large tensors or not leveraging memory-efficient data structures can cause excessive memory consumption and degrade system performance. Additionally, failing to manage GPU memory properly can result in out-of-memory errors or suboptimal utilization of GPU resources, further exacerbating the problem as shown in Figure 17.34.\n\n\n\n\n\n\nFigure 17.34: Example of GPU out-of-the-memory and suboptimal utilization issues\n\n\n\nSynchronization and Concurrency Problems: Incorrect synchronization between threads or processes can lead to race conditions, deadlocks, or inconsistent behavior in multi-threaded or distributed ML systems. This issue is often tied to improper handling of asynchronous operations, such as non-blocking I/O or parallel data loading, which can cause synchronization issues and impact the correctness of the ML pipeline. Moreover, proper coordination and communication between distributed nodes in a cluster can result in consistency or stale data during training or inference, compromising the reliability of the ML system.\nCompatibility Issues: Mismatches between the versions of ML frameworks, libraries, or dependencies can introduce compatibility problems and runtime errors. Upgrading or changing the versions of underlying libraries without thoroughly testing the impact on the ML system can lead to unexpected behavior or breakages. Furthermore, inconsistencies between the training and deployment environments, such as differences in hardware, operating systems, or package versions, can cause compatibility issues and affect the reproducibility of ML models, making it challenging to ensure consistent performance across different platforms.\nNumerical Instability and Precision Errors: Inadequate handling of numerical instabilities, such as division by zero, underflow, or overflow, can lead to incorrect calculations or convergence issues during training. This problem is compounded by insufficient precision or rounding errors, which can accumulate over time and impact the accuracy of the ML models, especially in deep learning architectures with many layers. Moreover, improper scaling or normalization of input data can cause numerical instabilities and affect the convergence and performance of optimization algorithms, resulting in suboptimal or unreliable model performance.\nInadequate Error Handling and Exception Management: Proper error handling and exception management can prevent ML systems from crashing or behaving unexpectedly when encountering exceptional conditions or invalid inputs. Failing to catch and handle specific exceptions or relying on generic exception handling can make it difficult to diagnose and recover from errors gracefully, leading to system instability and reduced reliability. Furthermore, incomplete or misleading error messages can hinder the ability to effectively debug and resolve software faults in ML frameworks, prolonging the time required to identify and fix issues.\n\n\nImpact on ML Systems\nSoftware faults in machine learning frameworks can have significant and far-reaching impacts on ML systems’ performance, reliability, and security. Let’s explore the various ways in which software faults can affect ML systems:\nPerformance Degradation and System Slowdowns: Memory leaks and inefficient resource management can lead to gradual performance degradation over time as the system becomes increasingly memory-constrained and spends more time on garbage collection or memory swapping (Maas et al. 2024). This issue is compounded by synchronization issues and concurrency bugs, which can cause delays, reduced throughput, and suboptimal utilization of computational resources, especially in multi-threaded or distributed ML systems. Furthermore, compatibility problems or inefficient code paths can introduce additional overhead and slowdowns, affecting the overall performance of the ML system.\n\nMaas, Martin, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2024. “Combining Machine Learning and Lifetime-Based Resource Management for Memory Allocation and Beyond.” Commun. ACM 67 (4): 87–96. https://doi.org/10.1145/3611018.\nIncorrect Predictions or Outputs: Software faults in data preprocessing, feature engineering, or model evaluation can introduce biases, noise, or errors propagating through the ML pipeline and resulting in incorrect predictions or outputs. Over time, numerical instabilities, precision errors, or rounding issues can accumulate and lead to degraded accuracy or convergence problems in the trained models. Moreover, faults in the model serving or inference components can cause inconsistencies between the expected and actual outputs, leading to incorrect or unreliable predictions in production.\nReliability and Stability Issues: Software faults can cause Unparalleled exceptions, crashes, or sudden terminations that can compromise the reliability and stability of ML systems, especially in production environments. Intermittent or sporadic faults can be difficult to reproduce and diagnose, leading to unpredictable behavior and reduced confidence in the ML system’s outputs. Additionally, faults in checkpointing, model serialization, or state management can cause data loss or inconsistencies, affecting the reliability and recoverability of the ML system.\nSecurity Vulnerabilities: Software faults, such as buffer overflows, injection vulnerabilities, or improper access control, can introduce security risks and expose the ML system to potential attacks or unauthorized access. Adversaries may exploit faults in the preprocessing or feature extraction stages to manipulate the input data and deceive the ML models, leading to incorrect or malicious behavior. Furthermore, inadequate protection of sensitive data, such as user information or confidential model parameters, can lead to data breaches or privacy violations (Q. Li et al. 2023).\n\nLi, Qinbin, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, and Bingsheng He. 2023. “A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection.” IEEE Trans. Knowl. Data Eng. 35 (4): 3347–66. https://doi.org/10.1109/tkde.2021.3124599.\nDifficulty in Reproducing and Debugging: Software faults can make it challenging to reproduce and debug issues in ML systems, especially when the faults are intermittent or dependent on specific runtime conditions. Incomplete or ambiguous error messages, coupled with the complexity of ML frameworks and models, can prolong the debugging process and hinder the ability to identify and fix the underlying faults. Moreover, inconsistencies between development, testing, and production environments can make reproducing and diagnosing faults in specific contexts difficult.\nIncreased Development and Maintenance Costs Software faults can lead to increased development and maintenance costs, as teams spend more time and resources debugging, fixing, and validating the ML system. The need for extensive testing, monitoring, and fault-tolerant mechanisms to mitigate the impact of software faults can add complexity and overhead to the ML development process. Frequent patches, updates, and bug fixes to address software faults can disrupt the development workflow and require additional effort to ensure the stability and compatibility of the ML system.\nUnderstanding the potential impact of software faults on ML systems is crucial for prioritizing testing efforts, implementing fault-tolerant designs, and establishing effective monitoring and debugging practices. By proactively addressing software faults and their consequences, ML practitioners can build more robust, reliable, and secure ML systems that deliver accurate and trustworthy results.\n\n\nDetection and Mitigation\nDetecting and mitigating software faults in machine learning frameworks is essential to ensure ML systems’ reliability, performance, and security. Let’s explore various techniques and approaches that can be employed to identify and address software faults effectively:\nThorough Testing and Validation: Comprehensive unit testing of individual components and modules can verify their correctness and identify potential faults early in development. Integration testing validates the interaction and compatibility between different components of the ML framework, ensuring seamless integration. Systematic testing of edge cases, boundary conditions, and exceptional scenarios helps uncover hidden faults and vulnerabilities. Continuous testing and regression testing as shown in Figure 17.35 detect faults introduced by code changes or updates to the ML framework.\n\n\n\n\n\n\nFigure 17.35: Automated regression testing (Credit: UTOR)\n\n\n\nStatic Code Analysis and Linting: Utilizing static code analysis tools automatically identifies potential coding issues, such as syntax errors, undefined variables, or security vulnerabilities. Enforcing coding standards and best practices through linting tools maintains code quality and reduces the likelihood of common programming mistakes. Conducting regular code reviews allows manual inspection of the codebase, identification of potential faults, and ensures adherence to coding guidelines and design principles.\nRuntime Monitoring and Logging: Implementing comprehensive logging mechanisms captures relevant information during runtime, such as input data, model parameters, and system events. Monitoring key performance metrics, resource utilization, and error rates helps detect anomalies, performance bottlenecks, or unexpected behavior. Employing runtime assertion checks and invariants validates assumptions and detects violations of expected conditions during program execution. Utilizing profiling tools identifies performance bottlenecks, memory leaks, or inefficient code paths that may indicate the presence of software faults.\nFault-Tolerant Design Patterns: Implementing error handling and exception management mechanisms enables graceful handling and recovery from exceptional conditions or runtime errors. Employing redundancy and failover mechanisms, such as backup systems or redundant computations, ensures the availability and reliability of the ML system in the presence of faults. Designing modular and loosely coupled architectures minimizes the propagation and impact of faults across different components of the ML system. Utilizing checkpointing and recovery mechanisms (Eisenman et al. 2022) allows the system to resume from a known stable state in case of failures or interruptions.\n\nEisenman, Assaf, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. “Check-n-Run: A Checkpointing System for Training Deep Learning Recommendation Models.” In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 929–43.\nRegular Updates and Patches: Staying up to date with the latest versions and patches of the ML frameworks, libraries, and dependencies provides benefits from bug fixes, security updates, and performance improvements. Monitoring release notes, security advisories, and community forums inform practitioners about known issues, vulnerabilities, or compatibility problems in the ML framework. Establishing a systematic process for testing and validating updates and patches before applying them to production systems ensures stability and compatibility.\nContainerization and Isolation: Leveraging containerization technologies, such as Docker or Kubernetes, encapsulates ML components and their dependencies in isolated environments. Utilizing containerization ensures consistent and reproducible runtime environments across development, testing, and production stages, reducing the likelihood of compatibility issues or environment-specific faults. Employing isolation techniques, such as virtual environments or sandboxing, prevents faults or vulnerabilities in one component from affecting other parts of the ML system.\nAutomated Testing and Continuous Integration/Continuous Deployment (CI/CD): Implement automated testing frameworks and scripts, execute comprehensive test suites, and catch faults early in development. Integrating automated testing into the CI/CD pipeline, as shown in Figure 17.36, ensures that code changes are thoroughly tested before being merged or deployed to production. Utilizing continuous monitoring and automated alerting systems detects and notifies developers and operators about potential faults or anomalies in real-time.\n\n\n\n\n\n\nFigure 17.36: Continuous Integration/Continuous Deployment (CI/CD) procedure (Credit: geeksforgeeks)\n\n\n\nAdopting a proactive and systematic approach to fault detection and mitigation can significantly improve ML systems’ robustness, reliability, and maintainability. By investing in comprehensive testing, monitoring, and fault-tolerant design practices, organizations can minimize the impact of software faults and ensure their ML systems’ smooth operation in production environments.\n\n\n\n\n\n\nExercise 17.4: Fault Tolerance\n\n\n\n\n\nGet ready to become an AI fault-fighting superhero! Software glitches can derail machine learning systems, but in this Colab, you’ll learn how to make them resilient. We’ll simulate software faults to see how AI can break, then explore techniques to save your ML model’s progress, like checkpoints in a game. You’ll see how to train your AI to bounce back after a crash, ensuring it stays on track. This is crucial for building reliable, trustworthy AI, especially in critical applications. So gear up because this Colab directly connects with the Robust AI chapter – you’ll move from theory to hands-on troubleshooting and build AI systems that can handle the unexpected!", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1958,7 +1958,7 @@ "section": "17.6 Tools and Frameworks", "text": "17.6 Tools and Frameworks\nGiven the significance or importance of developing robust AI systems, in recent years, researchers and practitioners have developed a wide range of tools and frameworks to understand how hardware faults manifest and propagate to impact ML systems. These tools and frameworks play a crucial role in evaluating the resilience of ML systems to hardware faults by simulating various fault scenarios and analyzing their impact on the system’s performance. This enables designers to identify potential vulnerabilities and develop effective mitigation strategies, ultimately creating more robust and reliable ML systems that can operate safely despite hardware faults. This section provides an overview of widely used fault models in the literature and the tools and frameworks developed to evaluate the impact of such faults on ML systems.\n\n17.6.1 Fault Models and Error Models\nAs discussed previously, hardware faults can manifest in various ways, including transient, permanent, and intermittent faults. In addition to the type of fault under study, how the fault manifests is also important. For example, does the fault happen in a memory cell or during the computation of a functional unit? Is the impact on a single bit, or does it impact multiple bits? Does the fault propagate all the way and impact the application (causing an error), or does it get masked quickly and is considered benign? All these details impact what is known as the fault model, which plays a major role in simulating and measuring what happens to a system when a fault occurs.\nTo effectively study and understand the impact of hardware faults on ML systems, it is essential to understand the concepts of fault models and error models. A fault model describes how a hardware fault manifests itself in the system, while an error model represents how the fault propagates and affects the system’s behavior.\nFault models can be categorized based on various characteristics:\n\nDuration: Transient faults occur briefly and then disappear, while permanent faults persist indefinitely. Intermittent faults occur sporadically and may be difficult to diagnose.\nLocation: Faults can occur in hardware parts, such as memory cells, functional units, or interconnects.\nGranularity: Faults can affect a single bit (e.g., bitflip) or multiple bits (e.g., burst errors) within a hardware component.\n\nOn the other hand, error models describe how a fault propagates through the system and manifests as an error. An error may cause the system to deviate from its expected behavior, leading to incorrect results or even system failures. Error models can be defined at different levels of abstraction, from the hardware level (e.g., register-level bitflips) to the software level (e.g., corrupted weights or activations in an ML model).\nThe fault model (or error model, typically the more applicable terminology in understanding the robustness of an ML system) plays a major role in simulating and measuring what happens to a system when a fault occurs. The chosen model informs the assumptions made about the system being studied. For example, a system focusing on single-bit transient errors (Sangchoolie, Pattabiraman, and Karlsson 2017) would not be well-suited to understand the impact of permanent, multi-bit flip errors (Wilkening et al. 2014), as it is designed assuming a different model altogether.\n\nWilkening, Mark, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva Gurumurthi, and David R. Kaeli. 2014. “Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults.” In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 293–305. IEEE; IEEE. https://doi.org/10.1109/micro.2014.15.\nFurthermore, implementing an error model is also an important consideration, particularly regarding where an error is said to occur in the compute stack. For instance, a single-bit flip model at the architectural register level differs from a single-bit flip in the weight of a model at the PyTorch level. Although both target a similar error model, the former would usually be modeled in an architecturally accurate simulator (like gem5 [binkert2011gem5]), which captures error propagation compared to the latter, focusing on value propagation through a model.\nRecent research has shown that certain characteristics of error models may exhibit similar behaviors across different levels of abstraction (Sangchoolie, Pattabiraman, and Karlsson 2017) (Papadimitriou and Gizopoulos 2021). For example, single-bit errors are generally more problematic than multi-bit errors, regardless of whether they are modeled at the hardware or software level. However, other characteristics, such as error masking (Mohanram and Touba 2003) as shown in Figure 17.37, may not always be accurately captured by software-level models, as they can hide underlying system effects.\n\nSangchoolie, Behrooz, Karthik Pattabiraman, and Johan Karlsson. 2017. “One Bit Is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors.” In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 97–108. IEEE; IEEE. https://doi.org/10.1109/dsn.2017.30.\n\nPapadimitriou, George, and Dimitris Gizopoulos. 2021. “Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers.” In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 902–15. IEEE; IEEE. https://doi.org/10.1109/isca52012.2021.00075.\n\nMohanram, K., and N. A. Touba. 2003. “Partial Error Masking to Reduce Soft Error Failure Rate in Logic Circuits.” In Proceedings. 16th IEEE Symposium on Computer Arithmetic, 433–40. IEEE; IEEE Comput. Soc. https://doi.org/10.1109/dftvs.2003.1250141.\n\n\n\n\n\n\nFigure 17.37: Example of error masking in microarchitectural components (Ko 2021)\n\n\nKo, Yohan. 2021. “Characterizing System-Level Masking Effects Against Soft Errors.” Electronics 10 (18): 2286. https://doi.org/10.3390/electronics10182286.\n\n\nSome tools, such as Fidelity (He, Balaprakash, and Li 2020), aim to bridge the gap between hardware-level and software-level error models by mapping patterns between the two levels of abstraction (Cheng et al. 2016). This allows for more accurate modeling of hardware faults in software-based tools, essential for developing robust and reliable ML systems. Lower-level tools typically represent more accurate error propagation characteristics but must be faster in simulating many errors due to the complex nature of hardware system designs. On the other hand, higher-level tools, such as those implemented in ML frameworks like PyTorch or TensorFlow, which we will discuss soon in the later sections, are often faster and more efficient for evaluating the robustness of ML systems.\n\nCheng, Eric, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R. Stan, et al. 2016. “Clear: \\(<\\)U\\(>\\)c\\(<\\)/u\\(>\\) Ross \\(<\\)u\\(>\\)-l\\(<\\)/u\\(>\\) Ayer \\(<\\)u\\(>\\)e\\(<\\)/u\\(>\\) Xploration for \\(<\\)u\\(>\\)a\\(<\\)/u\\(>\\) Rchitecting \\(<\\)u\\(>\\)r\\(<\\)/u\\(>\\) Esilience - Combining Hardware and Software Techniques to Tolerate Soft Errors in Processor Cores.” In Proceedings of the 53rd Annual Design Automation Conference, 1–6. ACM. https://doi.org/10.1145/2897937.2897996.\nIn the following subsections, we will discuss various hardware-based and software-based fault injection methods and tools, highlighting their capabilities, limitations, and the fault and error models they support.\n\n\n17.6.2 Hardware-based Fault Injection\nAn error injection tool is a tool that allows the user to implement a particular error model, such as a transient single-bit flip during inference Figure 17.38. Most error injection tools are software-based, as software-level tools are faster for ML robustness studies. However, hardware-based fault injection methods are still important for grounding the higher-level error models, as they are considered the most accurate way to study the impact of faults on ML systems by directly manipulating the hardware to introduce faults. These methods allow researchers to observe the system’s behavior under real-world fault conditions. Both software-based and hardware-based error injection tools are described in this section in more detail.\n\n\n\n\n\n\nFigure 17.38: Hardware errors can occur due to a variety of reasons and at different times and/or locations in a system, which can be explored when studying the impact of hardware-based errors on systems (Ahmadilivani et al. 2024)\n\n\nAhmadilivani, Mohammad Hasan, Mahdi Taheri, Jaan Raik, Masoud Daneshtalab, and Maksim Jenihhin. 2024. “A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks.” ACM Comput. Surv. 56 (6): 1–39. https://doi.org/10.1145/3638242.\n\n\n\nMethods\nTwo of the most common hardware-based fault injection methods are FPGA-based fault injection and radiation or beam testing.\nFPGA-based Fault Injection: Field-Programmable Gate Arrays (FPGAs) are reconfigurable integrated circuits that can be programmed to implement various hardware designs. In the context of fault injection, FPGAs offer high precision and accuracy, as researchers can target specific bits or sets of bits within the hardware. By modifying the FPGA configuration, faults can be introduced at specific locations and times during the execution of an ML model. FPGA-based fault injection allows for fine-grained control over the fault model, enabling researchers to study the impact of different types of faults, such as single-bit flips or multi-bit errors. This level of control makes FPGA-based fault injection a valuable tool for understanding the resilience of ML systems to hardware faults.\nRadiation or Beam Testing: Radiation or beam testing (Velazco, Foucard, and Peronnard 2010) involves exposing the hardware running an ML model to high-energy particles, such as protons or neutrons as illustrated in Figure 17.39. These particles can cause bitflips or other types of faults in the hardware, mimicking the effects of real-world radiation-induced faults. Beam testing is widely regarded as a highly accurate method for measuring the error rate induced by particle strikes on a running application. It provides a realistic representation of the faults in real-world environments, particularly in applications exposed to high radiation levels, such as space systems or particle physics experiments. However, unlike FPGA-based fault injection, beam testing could be more precise in targeting specific bits or components within the hardware, as it might be difficult to aim the beam of particles to a particular bit in the hardware. Despite being quite expensive from a research standpoint, beam testing is a well-regarded industry practice for reliability.\n\nVelazco, Raoul, Gilles Foucard, and Paul Peronnard. 2010. “Combining Results of Accelerated Radiation Tests and Fault Injections to Predict the Error Rate of an Application Implemented in SRAM-Based FPGAs.” IEEE Trans. Nucl. Sci. 57 (6): 3500–3505. https://doi.org/10.1109/tns.2010.2087355.\n\n\n\n\n\n\n\nFigure 17.39: Radiation test setup for semiconductor components (Lee et al. 2022) (Credit: JD Instrument)\n\n\nLee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang, and Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed Signal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear Explosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970.\n\n\n\n\nLimitations\nDespite their high accuracy, hardware-based fault injection methods have several limitations that can hinder their widespread adoption:\nCost: FPGA-based fault injection and beam testing require specialized hardware and facilities, which can be expensive to set up and maintain. The cost of these methods can be a significant barrier for researchers and organizations with limited resources.\nScalability: Hardware-based methods are generally slower and less scalable than software-based methods. Injecting faults and collecting data on hardware can take time, limiting the number of experiments performed within a given timeframe. This can be particularly challenging when studying the resilience of large-scale ML systems or conducting statistical analyses that require many fault injection experiments.\nFlexibility: Hardware-based methods may not be as flexible as software-based methods in terms of the range of fault models and error models they can support. Modifying the hardware configuration or the experimental setup to accommodate different fault models can be more challenging and time-consuming than software-based methods.\nDespite these limitations, hardware-based fault injection methods remain essential tools for validating the accuracy of software-based methods and for studying the impact of faults on ML systems in realistic settings. By combining hardware-based and software-based methods, researchers can gain a more comprehensive understanding of ML systems’ resilience to hardware faults and develop effective mitigation strategies.\n\n\n\n17.6.3 Software-based Fault Injection Tools\nWith the rapid development of ML frameworks in recent years, software-based fault injection tools have gained popularity in studying the resilience of ML systems to hardware faults. These tools simulate the effects of hardware faults by modifying the software representation of the ML model or the underlying computational graph. The rise of ML frameworks such as TensorFlow, PyTorch, and Keras has facilitated the development of fault injection tools that are tightly integrated with these frameworks, making it easier for researchers to conduct fault injection experiments and analyze the results.\n\nAdvantages and Trade-offs\nSoftware-based fault injection tools offer several advantages over hardware-based methods:\nSpeed: Software-based tools are generally faster than hardware-based methods, as they do not require the modification of physical hardware or the setup of specialized equipment. This allows researchers to conduct more fault injection experiments in a shorter time, enabling more comprehensive analyses of the resilience of ML systems.\nFlexibility: Software-based tools are more flexible than hardware-based methods in terms of the range of fault and error models they can support. Researchers can easily modify the fault injection tool’s software implementation to accommodate different fault models or to target specific components of the ML system.\nAccessibility: Software-based tools are more accessible than hardware-based methods, as they do not require specialized hardware or facilities. This makes it easier for researchers and practitioners to conduct fault injection experiments and study the resilience of ML systems, even with limited resources.\n\n\nLimitations\nSoftware-based fault injection tools also have some limitations compared to hardware-based methods:\nAccuracy: Software-based tools may not always capture the full range of effects that hardware faults can have on the system. As these tools operate at a higher level of abstraction, they may need to catch up on some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nFidelity: Software-based tools may provide a different level of Fidelity than hardware-based methods in terms of representing real-world fault conditions. The accuracy of the results obtained from software-based fault injection experiments may depend on how closely the software model approximates the actual hardware behavior.\n\n\n\n\n\n\nFigure 17.40: Comparison of techniques at layers of abstraction (Credit: MAVFI)\n\n\n\n\n\nTypes of Fault Injection Tools\nSoftware-based fault injection tools can be categorized based on their target frameworks or use cases. Here, we will discuss some of the most popular tools in each category:\nAres (Reagen et al. 2018), a fault injection tool initially developed for the Keras framework in 2018, emerged as one of the first tools to study the impact of hardware faults on deep neural networks (DNNs) in the context of the rising popularity of ML frameworks in the mid-to-late 2010s. The tool was validated against a DNN accelerator implemented in silicon, demonstrating its effectiveness in modeling hardware faults. Ares provides a comprehensive study on the impact of hardware faults in both weights and activation values, characterizing the effects of single-bit flips and bit-error rates (BER) on hardware structures. Later, the Ares framework was extended to support the PyTorch ecosystem, enabling researchers to investigate hardware faults in a more modern setting and further extending its utility in the field.\n\nReagen, Brandon, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. “Ares: A Framework for Quantifying the Resilience of Deep Neural Networks.” In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465834.\n\n\n\n\n\n\nFigure 17.41: Hardware bitflips in ML workloads can cause phantom objects and misclassifications, which can erroneously be used downstream by larger systems, such as in autonomous driving. Shown above is a correct and faulty version of the same image using the PyTorchFI injection framework.\n\n\n\nPyTorchFI (Mahmoud et al. 2020), a fault injection tool specifically designed for the PyTorch framework, was developed in 2020 in collaboration with Nvidia Research. It enables the injection of faults into the weights, activations, and gradients of PyTorch models, supporting a wide range of fault models. By leveraging the GPU acceleration capabilities of PyTorch, PyTorchFI provides a fast and efficient implementation for conducting fault injection experiments on large-scale ML systems, as shown in Figure 17.41. The tool’s speed and ease of use have led to widespread adoption in the community, resulting in multiple developer-led projects, such as PyTorchALFI by Intel xColabs, which focuses on safety in automotive environments. Follow-up PyTorch-centric tools for fault injection include Dr. DNA by Meta (Ma et al. 2024) (which further facilitates the Pythonic programming model for ease of use), and the GoldenEye framework (Mahmoud et al. 2022), which incorporates novel numerical datatypes (such as AdaptivFloat (Tambe et al. 2020) and BlockFloat in the context of hardware bit flips.\n\nMahmoud, Abdulrahman, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez Vicarte, Sarita V. Adve, Christopher W. Fletcher, Iuri Frosio, and Siva Kumar Sastry Hari. 2020. “PyTorchFI: A Runtime Perturbation Tool for DNNs.” In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-w), 25–31. IEEE; IEEE. https://doi.org/10.1109/dsn-w50199.2020.00014.\n\nMa, Dongning, Fred Lin, Alban Desmaison, Joel Coburn, Daniel Moore, Sriram Sankar, and Xun Jiao. 2024. “Dr. DNA: Combating Silent Data Corruptions in Deep Learning Using Distribution of Neuron Activations.” In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, 239–52. ACM. https://doi.org/10.1145/3620666.3651349.\n\nMahmoud, Abdulrahman, Thierry Tambe, Tarek Aloui, David Brooks, and Gu-Yeon Wei. 2022. “GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators.” In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 206–14. IEEE. https://doi.org/10.1109/dsn53405.2022.00031.\n\nTambe, Thierry, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.” In 2020 57th ACM/IEEE Design Automation Conference (DAC), 1–6. IEEE; IEEE. https://doi.org/10.1109/dac18072.2020.9218516.\n\nChen, Zitao, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2020. “TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications.” In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 426–35. IEEE; IEEE. https://doi.org/10.1109/issre5003.2020.00047.\n\nChen, Zitao, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2019. “\\(<\\)i\\(>\\)BinFI\\(<\\)/i\\(>\\): An Efficient Fault Injector for Safety-Critical Machine Learning Systems.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’19. New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356177.\nTensorFI (Chen et al. 2020), or the TensorFlow Fault Injector, is a fault injection tool developed specifically for the TensorFlow framework. Analogous to Ares and PyTorchFI, TensorFI is considered the state-of-the-art tool for ML robustness studies in the TensorFlow ecosystem. It allows researchers to inject faults into the computational graph of TensorFlow models and study their impact on the model’s performance, supporting a wide range of fault models. One of the key benefits of TensorFI is its ability to evaluate the resilience of various ML models, not just DNNs. Further advancements, such as BinFi (Chen et al. 2019), provide a mechanism to speed up error injection experiments by focusing on the \"important\" bits in the system, accelerating the process of ML robustness analysis and prioritizing the critical components of a model.\nNVBitFI (T. Tsai et al. 2021), a general-purpose fault injection tool developed by Nvidia for their GPU platforms, operates at a lower level compared to framework-specific tools like Ares, PyTorchFI, and TensorFlow. While these tools focus on various deep learning platforms to implement and perform robustness analysis, NVBitFI targets the underlying hardware assembly code for fault injection. This allows researchers to inject faults into any application running on Nvidia GPUs, making it a versatile tool for studying the resilience of ML systems and other GPU-accelerated applications. By enabling users to inject errors at the architectural level, NVBitFI provides a more general-purpose fault model that is not restricted to just ML models. As Nvidia’s GPU systems are commonly used in many ML-based systems, NVBitFI is a valuable tool for comprehensive fault injection analysis across various applications.\n\nTsai, Timothy, Siva Kumar Sastry Hari, Michael Sullivan, Oreste Villa, and Stephen W. Keckler. 2021. “NVBitFI: Dynamic Fault Injection for GPUs.” In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 284–91. IEEE; IEEE. https://doi.org/10.1109/dsn48987.2021.00041.\n\nDomain-specific Examples\nDomain-specific fault injection tools have been developed to address various ML application domains’ unique challenges and requirements, such as autonomous vehicles and robotics. This section highlights three domain-specific fault injection tools: DriveFI and PyTorchALFI for autonomous vehicles and MAVFI for uncrewed aerial vehicles (UAVs). These tools enable researchers to inject hardware faults into these complex systems’ perception, control, and other subsystems, allowing them to study the impact of faults on system performance and safety. The development of these software-based fault injection tools has greatly expanded the capabilities of the ML community to develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\nDriveFI (Jha et al. 2019) is a fault injection tool designed for autonomous vehicles. It enables the injection of hardware faults into the perception and control pipelines of autonomous vehicle systems, allowing researchers to study the impact of these faults on the system’s performance and safety. DriveFI has been integrated with industry-standard autonomous driving platforms, such as Nvidia DriveAV and Baidu Apollo, making it a valuable tool for evaluating the resilience of autonomous vehicle systems.\n\nJha, Saurabh, Subho Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B. Sullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, and Ravishankar K. Iyer. 2019. “ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection.” In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 112–24. IEEE; IEEE. https://doi.org/10.1109/dsn.2019.00025.\n\nGräfe, Ralf, Qutub Syed Sha, Florian Geissler, and Michael Paulitsch. 2023. “Large-Scale Application of Fault Injection into PyTorch Models -an Extension to PyTorchFI for Validation Efficiency.” In 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-s), 56–62. IEEE; IEEE. https://doi.org/10.1109/dsn-s58398.2023.00025.\nPyTorchALFI (Gräfe et al. 2023) is an extension of PyTorchFI developed by Intel xColabs for the autonomous vehicle domain. It builds upon PyTorchFI’s fault injection capabilities. It adds features specifically tailored for evaluating the resilience of autonomous vehicle systems, such as the ability to inject faults into the camera and LiDAR sensor data.\nMAVFI (Hsiao et al. 2023) is a fault injection tool designed for the robotics domain, specifically for uncrewed aerial vehicles (UAVs). MAVFI is built on top of the Robot Operating System (ROS) framework and allows researchers to inject faults into the various components of a UAV system, such as sensors, actuators, and control algorithms. By evaluating the impact of these faults on the UAV’s performance and stability, researchers can develop more resilient and fault-tolerant UAV systems.\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay Janapa Reddi. 2023. “MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.” In 2023 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\nThe development of software-based fault injection tools has greatly expanded the capabilities of researchers and practitioners to study the resilience of ML systems to hardware faults. By leveraging the speed, flexibility, and accessibility of these tools, the ML community can develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\n\n\n\n\n17.6.4 Bridging the Gap between Hardware and Software Error Models\nWhile software-based fault injection tools offer many advantages in speed, flexibility, and accessibility, they may not always accurately capture the full range of effects that hardware faults can have on the system. This is because software-based tools operate at a higher level of abstraction than hardware-based methods and may miss some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nAs Bolchini et al. (2023) illustrates in their work, hardware errors can manifest in complex spatial distribution patterns that are challenging to fully replicate with software-based fault injection alone. They identify four distinct patterns: (a) single point, where the fault corrupts a single value in a feature map; (b) same row, where the fault corrupts a partial or entire row in a single feature map; (c) bullet wake, where the fault corrupts the same location across multiple feature maps; and (d) shatter glass, which combines the effects of same row and bullet wake patterns, as shown in Figure 17.42. These intricate error propagation mechanisms highlight the need for hardware-aware fault injection techniques to accurately assess the resilience of ML systems.\n\n\n\n\n\n\nFigure 17.42: Hardware errors may manifest themselves in different ways at the software level, as classified by Bolchini et al. (Bolchini et al. 2023)\n\n\nBolchini, Cristiana, Luca Cassano, Antonio Miele, and Alessandro Toschi. 2023. “Fast and Accurate Error Simulation for CNNs Against Soft Errors.” IEEE Trans. Comput. 72 (4): 984–97. https://doi.org/10.1109/tc.2022.3184274.\n\n\nResearchers have developed tools to address this issue by bridging the gap between low-level hardware error models and higher-level software error models. One such tool is Fidelity, designed to map patterns between hardware-level faults and their software-level manifestations.\n\nFidelity: Bridging the Gap\nFidelity (He, Balaprakash, and Li 2020) is a tool for accurately modeling hardware faults in software-based fault injection experiments. It achieves this by carefully studying the relationship between hardware-level faults and their impact on the software representation of the ML system.\n\nHe, Yi, Prasanna Balaprakash, and Yanjing Li. 2020. “FIdelity: Efficient Resilience Analysis Framework for Deep Learning Accelerators.” In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 270–81. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00033.\nThe key insights behind Fidelity are:\n\nFault Propagation: Fidelity models how faults propagate through the hardware and manifest as errors in the software-visible state of the system. By understanding these propagation patterns, Fidelity can more accurately simulate the effects of hardware faults in software-based experiments.\nFault Equivalence: Fidelity identifies equivalent classes of hardware faults that produce similar software-level errors. This allows researchers to design software-based fault models that are representative of the underlying hardware faults without the need to model every possible hardware fault individually.\nLayered Approach: Fidelity employs a layered approach to fault modeling, where the effects of hardware faults are propagated through multiple levels of abstraction, from the hardware to the software level. This approach ensures that the software-based fault models are grounded in the actual behavior of the hardware.\n\nBy incorporating these insights, Fidelity enables software-based fault injection tools to capture the effects of hardware faults on ML systems accurately. This is particularly important for safety-critical applications, where the system’s resilience to hardware faults is paramount.\n\n\nImportance of Capturing True Hardware Behavior\nCapturing true hardware behavior in software-based fault injection tools is crucial for several reasons:\n\nAccuracy: By accurately modeling the effects of hardware faults, software-based tools can provide more reliable insights into the resilience of ML systems. This is essential for designing and validating fault-tolerant systems that can operate safely and effectively in the presence of hardware faults.\nReproducibility: When software-based tools accurately capture hardware behavior, fault injection experiments become more reproducible across different platforms and environments. This is important for the scientific study of ML system resilience, as it allows researchers to compare and validate results across different studies and implementations.\nEfficiency: Software-based tools that capture true hardware behavior can be more efficient in their fault injection experiments by focusing on the most representative and impactful fault models. This allows researchers to cover a wider range of fault scenarios and system configurations with limited computational resources.\nMitigation Strategies: Understanding how hardware faults manifest at the software level is crucial for developing effective mitigation strategies. By accurately capturing hardware behavior, software-based fault injection tools can help researchers identify the most vulnerable components of the ML system and design targeted hardening techniques to improve resilience.\n\nTools like Fidelity are vital in advancing the state-of-the-art in ML system resilience research. These tools enable researchers to conduct more accurate, reproducible, and efficient fault injection experiments by bridging the gap between hardware and software error models. As the complexity and criticality of ML systems continue to grow, the importance of capturing true hardware behavior in software-based fault injection tools will only become more apparent.\nOngoing research in this area aims to refine the mapping between hardware and software error models and develop new techniques for efficiently simulating hardware faults in software-based experiments. As these tools mature, they will provide the ML community with increasingly powerful and accessible means to study and improve the resilience of ML systems to hardware faults.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1969,7 +1969,7 @@ "section": "17.7 Conclusion", "text": "17.7 Conclusion\nDeveloping robust and resilient AI is paramount as machine learning systems become increasingly integrated into safety-critical applications and real-world environments. This chapter has explored the key challenges to AI robustness arising from hardware faults, malicious attacks, distribution shifts, and software bugs.\nSome of the key takeaways include the following:\n\nHardware Faults: Transient, permanent, and intermittent faults in hardware components can corrupt computations and degrade the performance of machine learning models if not properly detected and mitigated. Techniques such as redundancy, error correction, and fault-tolerant designs play a crucial role in building resilient ML systems that can withstand hardware faults.\nModel Robustness: Malicious actors can exploit vulnerabilities in ML models through adversarial attacks and data poisoning, aiming to induce targeted misclassifications, skew the model’s learned behavior, or compromise the system’s integrity and reliability. Also, distribution shifts can occur when the data distribution encountered during deployment differs from those seen during training, leading to performance degradation. Implementing defensive measures, including adversarial training, anomaly detection, robust model architectures, and techniques such as domain adaptation, transfer learning, and continual learning, is essential to safeguard against these challenges and ensure the model’s reliability and generalization in dynamic environments.\nSoftware Faults: Faults in ML frameworks, libraries, and software stacks can propagate errors, degrade performance, and introduce security vulnerabilities. Rigorous testing, runtime monitoring, and adopting fault-tolerant design patterns are essential for building robust software infrastructure supporting reliable ML systems.\n\nAs ML systems take on increasingly complex tasks with real-world consequences, prioritizing resilience becomes critical. The tools and frameworks discussed in this chapter, including fault injection techniques, error analysis methods, and robustness evaluation frameworks, provide practitioners with the means to thoroughly test and harden their ML systems against various failure modes and adversarial conditions.\nMoving forward, resilience must be a central focus throughout the entire AI development lifecycle, from data collection and model training to deployment and monitoring. By proactively addressing the multifaceted challenges to robustness, we can develop trustworthy, reliable ML systems that can navigate the complexities and uncertainties of real-world environments.\nFuture research in robust ML should continue to advance techniques for detecting and mitigating faults, attacks, and distributional shifts. Additionally, exploring novel paradigms for developing inherently resilient AI architectures, such as self-healing systems or fail-safe mechanisms, will be crucial in pushing the boundaries of AI robustness. By prioritizing resilience and investing in developing robust AI systems, we can unlock the full potential of machine learning technologies while ensuring their safe, reliable, and responsible deployment in real-world applications. As AI continues to shape our future, building resilient systems that can withstand the challenges of the real world will be a defining factor in the success and societal impact of this transformative technology.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1978,9 +1978,9 @@ "href": "contents/robust_ai/robust_ai.html#sec-robust-ai-resource", "title": "17  Robust AI", "section": "17.8 Resources", - "text": "17.8 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 17.1\nExercise 17.2\nExercise 17.3\nExercise 17.4\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "17.8 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will add new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage both students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nComing soon.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\nTo reinforce the concepts covered in this chapter, we have curated a set of exercises that challenge students to apply their knowledge and deepen their understanding.\n\nExercise 17.1\nExercise 17.2\nExercise 17.3\nExercise 17.4\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Advanced Topics", "17  Robust AI" ] }, @@ -1989,9 +1989,9 @@ "href": "contents/generative_ai/generative_ai.html", "title": "18  Generative AI", "section": "", - "text": "Coming soon!\n\n\n\n\n\n\nLearning Objectives\n\n\n\n\nComing soon.", + "text": "Coming soon!\nImagine a chapter that writes itself and adapts to your curiosity, generating new insights as you read. We’re working on something extraordinary!\nOur Generative AI chapter will transform how you read and learn, dynamically generating content as you go. While we fine-tune this revolutionary feature, get ready for an educational experience that’s as dynamic and unique as you are. Mark your calendars for the big reveal and bookmark this page.\nThe future of generative learning is here! — Vijay Janapa Reddi", "crumbs": [ - "MAIN", + "Advanced Topics", "18  Generative AI" ] }, @@ -2002,7 +2002,7 @@ "section": "", "text": "19.1 Introduction\nTo give ourselves a framework around which to think about AI for social good, we will be following the UN Sustainable Development Goals (SDGs). The UN SDGs are a collection of 17 global goals, shown in Figure 19.1, adopted by the United Nations in 2015 as part of the 2030 Agenda for Sustainable Development. The SDGs address global challenges related to poverty, inequality, climate change, environmental degradation, prosperity, and peace and justice.\nWhat is special about the SDGs is that they are a collection of interlinked objectives designed to serve as a “shared blueprint for peace and prosperity for people and the planet, now and into the future.” The SDGs emphasize sustainable development’s interconnected environmental, social, and economic aspects by putting sustainability at their center.\nA recent study (Vinuesa et al. 2020) highlights the influence of AI on all aspects of sustainable development, particularly on the 17 Sustainable Development Goals (SDGs) and 169 targets internationally defined in the 2030 Agenda for Sustainable Development. The study shows that AI can act as an enabler for 134 targets through technological improvements, but it also highlights the challenges of AI on some targets. The study shows that AI can benefit 67 targets when considering AI and societal outcomes. Still, it also warns about the issues related to the implementation of AI in countries with different cultural values and wealth.\nIn our book’s context, TinyML could help advance at least some of these SDG goals.\nThe portability, lower power requirements, and real-time analytics enabled by TinyML make it well-suited for addressing several sustainability challenges developing regions face. The widespread deployment of power solutions has the potential to provide localized and cost-effective monitoring to help achieve some of the UN’s SDGs. In the rest of the sections, we will dive into how TinyML is useful across many sectors that can address the UN SDGs.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2013,7 +2013,7 @@ "section": "", "text": "Vinuesa, Ricardo, Hossein Azizpour, Iolanda Leite, Madeline Balaam, Virginia Dignum, Sami Domisch, Anna Felländer, Simone Daniela Langhans, Max Tegmark, and Francesco Fuso Nerini. 2020. “The Role of Artificial Intelligence in Achieving the Sustainable Development Goals.” Nat. Commun. 11 (1): 1–10. https://doi.org/10.1038/s41467-019-14108-y.\n\n\n\n\n\n\nFigure 19.1: United Nations Sustainable Development Goals (SDG). Credit: United Nations.\n\n\n\n\n\nGoal 1 - No Poverty: TinyML could help provide low-cost solutions for crop monitoring to improve agricultural yields in developing countries.\nGoal 2 - Zero Hunger: TinyML could enable localized and precise crop health monitoring and disease detection to reduce crop losses.\nGoal 3 - Good Health and Wellbeing: TinyML could help enable low-cost medical diagnosis tools for early detection and prevention of diseases in remote areas.\nGoal 6 - Clean Water and Sanitation: TinyML could monitor water quality and detect contaminants to ensure Access to clean drinking water.\nGoal 7 - Affordable and Clean Energy: TinyML could optimize energy consumption and enable predictive maintenance for renewable energy infrastructure.\nGoal 11 - Sustainable Cities and Communities: TinyML could enable intelligent traffic management, air quality monitoring, and optimized resource management in smart cities.\nGoal 13 - Climate Action: TinyML could monitor deforestation and track reforestation efforts. It could also help predict extreme weather events.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2024,7 +2024,7 @@ "section": "19.2 Agriculture", "text": "19.2 Agriculture\nAgriculture is essential to achieving many of the UN Sustainable Development Goals, including eradicating Hunger and malnutrition, promoting economic growth, and using natural resources sustainably. TinyML can be a valuable tool to help advance sustainable agriculture, especially for smallholder farmers in developing regions.\nTinyML solutions can provide real-time monitoring and data analytics for crop health and growing conditions - all without reliance on connectivity infrastructure. For example, low-cost camera modules connected to microcontrollers can monitor for disease, pests, and nutritional deficiencies. TinyML algorithms can analyze the images to detect issues early before they spread and damage yields. Precision monitoring can optimize inputs like water, fertilizer, and pesticides - improving efficiency and sustainability.\nOther sensors, such as GPS units and accelerometers, can track microclimate conditions, soil humidity, and livestock wellbeing. Local real-time data helps farmers respond and adapt better to changes in the field. TinyML analytics at the edge avoids lag, network disruptions, and the high data costs of cloud-based systems. Localized systems allow customization of specific crops, diseases, and regional issues.\nWidespread TinyML applications can help digitize smallholder farms to increase productivity, incomes, and resilience. The low cost of hardware and minimal connectivity requirements make solutions accessible. Projects across the developing world have shown the benefits:\n\nMicrosoft’s FarmBeats project is an end-to-end approach to enable data-driven farming by using low-cost sensors, drones, and vision and machine learning algorithms. The project aims to solve the problem of limited adoption of technology in farming due to the need for more power and internet connectivity in farms and the farmers’ limited technology savviness. The project aims to increase farm productivity and reduce costs by coupling data with farmers’ knowledge and intuition about their farms. The project has successfully enabled actionable insights from data by building artificial intelligence (AI) or machine learning (ML) models based on fused data sets.\nIn Sub-Saharan Africa, off-the-shelf cameras and edge AI have cut cassava disease losses from 40% to 5%, protecting a staple crop (Ramcharan et al. 2017).\nIn Indonesia, sensors monitor microclimates across rice paddies, optimizing water usage even with erratic rains (Tirtalistyani, Murtiningrum, and Kanwar 2022).\n\n\nRamcharan, Amanda, Kelsee Baranowski, Peter McCloskey, Babuali Ahmed, James Legg, and David P. Hughes. 2017. “Deep Learning for Image-Based Cassava Disease Detection.” Front. Plant Sci. 8 (October): 1852. https://doi.org/10.3389/fpls.2017.01852.\n\nTirtalistyani, Rose, Murtiningrum Murtiningrum, and Rameshwar S. Kanwar. 2022. “Indonesia Rice Irrigation System: Time for Innovation.” Sustainability 14 (19): 12477. https://doi.org/10.3390/su141912477.\nWith greater investment and integration into rural advisory services, TinyML could transform small-scale agriculture and improve farmers’ livelihoods worldwide. The technology effectively brings the benefits of precision agriculture to disconnected regions most in need.\n\n\n\n\n\n\nExercise 19.1: Crop Yield Modeling\n\n\n\n\n\nThis exercise teaches you how to predict crop yields in Nepal by combining satellite data (Sentinel-2), climate data (WorldClim), and on-the-ground measurements. You’ll use a machine learning algorithm called XGBoost Regressor to build a model, split the data for training and testing, and fine-tune the model parameters for the best performance. This notebook lays the foundation for implementing TinyML in the agriculture domain. Consider how you could adapt this process for smaller datasets, fewer features, and simplified models to make it compatible with the power and memory constraints of TinyML devices.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2035,7 +2035,7 @@ "section": "19.3 Healthcare", "text": "19.3 Healthcare\n\n19.3.1 Expanding Access\nUniversal health coverage and quality care remain out of reach for millions worldwide. In many regions, more medical professionals are required to Access basic diagnosis and treatment. Additionally, healthcare infrastructure like clinics, hospitals, and utilities to power complex equipment needs to be improved. These gaps disproportionately impact marginalized communities, exacerbating health disparities.\nTinyML offers a promising technological solution to help expand Access to quality healthcare globally. TinyML refers to the ability to deploy machine learning algorithms on microcontrollers, tiny chips with processing power, memory, and connectivity. TinyML enables real-time data analysis and intelligence in low-powered, compact devices.\nThis creates opportunities for transformative medical tools that are portable, affordable, and accessible. TinyML software and hardware can be optimized to run even in resource-constrained environments. For example, a TinyML system could analyze symptoms or make diagnostic predictions using minimal computing power, no continuous internet connectivity, and a battery or solar power source. These capabilities can bring medical-grade screening and monitoring directly to underserved patients.\n\n\n19.3.2 Early Diagnosis\nEarly detection of diseases is one major application. Small sensors paired with TinyML software can identify symptoms before conditions escalate or visible signs appear. For instance, cough monitors with embedded machine learning can pick up on acoustic patterns indicative of respiratory illness, malaria, or tuberculosis. Detecting diseases at onset improves outcomes and reduces healthcare costs.\nA detailed example could be given for TinyML monitoring pneumonia in children. Pneumonia is a leading cause of death for children under 5, and detecting it early is critical. A startup called Respira xColabs has developed a low-cost wearable audio sensor that uses TinyML algorithms to analyze coughs and identify symptoms of respiratory illnesses like pneumonia. The device contains a microphone sensor and microcontroller that runs a neural network model trained to classify respiratory sounds. It can identify features like wheezing, crackling, and stridor that may indicate pneumonia. The device is designed to be highly accessible - it has a simple strap, requires no battery or charging, and results are provided through LED lights and audio cues.\nAnother example involves researchers at UNIFEI in Brazil who have developed a low-cost device that leverages TinyML to monitor heart rhythms. Their innovative solution addresses a critical need - atrial fibrillation and other heart rhythm abnormalities often go undiagnosed due to the prohibitive cost and limited availability of screening tools. The device overcomes these barriers through its ingenious design. It uses an off-the-shelf microcontroller that costs only a few dollars, along with a basic pulse sensor. By minimizing complexity, the device becomes accessible to under-resourced populations. The TinyML algorithm running locally on the microcontroller analyzes pulse data in real-time to detect irregular heart rhythms. This life-saving heart monitoring device demonstrates how TinyML enables powerful AI capabilities to be deployed in cost-effective, user-friendly designs.\nTinyML’s versatility also shows promise for tackling infectious diseases. Researchers have proposed applying TinyML to identify malaria-spreading mosquitoes by their wingbeat sounds. When equipped with microphones, small microcontrollers can run advanced audio classification models to determine mosquito species. This compact, low-power solution produces results in real time, suitable for remote field use. By making entomology analytics affordable and accessible, TinyML could revolutionize monitoring insects that endanger human health. TinyML is expanding healthcare access for vulnerable communities from heart disease to malaria.\n\n\n19.3.3 Infectious Disease Control\nMosquitoes remain the most deadly disease vector worldwide, transmitting illnesses that infect over one billion people annually (“Vector-Borne Diseases,” n.d.). Diseases like malaria, dengue, and Zika are especially prevalent in resource-limited regions lacking robust infrastructure for mosquito control. Monitoring local mosquito populations is essential to prevent outbreaks and properly target interventions.\n\n“Vector-Borne Diseases.” n.d. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases.\nTraditional monitoring methods are expensive, labor-intensive, and difficult to deploy remotely. The proposed TinyML solution aims to overcome these barriers. Small microphones coupled with machine learning algorithms can classify mosquitoes by species based on minute differences in wing oscillations. The TinyML software runs efficiently on low-cost microcontrollers, eliminating the need for continuous connectivity.\nA collaborative research team from the University of Khartoum and the ICTP is exploring an innovative solution using TinyML. In a recent paper, they presented a low-cost device that can identify disease-spreading mosquito species through their wing beat sounds (Altayeb, Zennaro, and Rovai 2022).\n\nAltayeb, Moez, Marco Zennaro, and Marcelo Rovai. 2022. “Classifying Mosquito Wingbeat Sound Using TinyML.” In Proceedings of the 2022 ACM Conference on Information Technology for Social Good, 132–37. ACM. https://doi.org/10.1145/3524458.3547258.\nThis portable, self-contained system shows great promise for entomology. The researchers suggest it could revolutionize insect monitoring and vector control strategies in remote areas. TinyML could significantly bolster malaria eradication efforts by providing cheaper, easier mosquito analytics. Its versatility and minimal power needs make it ideal for field use in isolated, off-grid regions with scarce resources but high disease burden.\n\n\n19.3.4 TinyML Design Contest in Healthcare\nThe first TinyML contest in healthcare, TDC’22 (Jia et al. 2023), was held in 2022 to motivate participating teams to design AI/ML algorithms for detecting life-threatening ventricular arrhythmias (VAs) and deploy them on Implantable Cardioverter Defibrillators (ICDs). VAs are the main cause of sudden cardiac death (SCD). People at high risk of SCD rely on the ICD to deliver proper and timely defibrillation treatment (i.e., shocking the heart back into normal rhythm) when experiencing life-threatening VAs.\n\nJia, Zhenge, Dawei Li, Xiaowei Xu, Na Li, Feng Hong, Lichuan Ping, and Yiyu Shi. 2023. “Life-Threatening Ventricular Arrhythmia Detection Challenge in Implantable Cardioverterdefibrillators.” Nature Machine Intelligence 5 (5): 554–55. https://doi.org/10.1038/s42256-023-00659-9.\nAn on-device algorithm for early and timely life-threatening VA detection will increase the chances of survival. The proposed AI/ML algorithm needed to be deployed and executed on an extremely low-power and resource-constrained microcontroller (MCU) (a $10 development board with an ARM Cortex-M4 core at 80 MHz, 256 kB of flash memory and 64 kB of SRAM). The submitted designs were evaluated by metrics measured on the MCU for (1) detection performance, (2) inference latency, and (3) memory occupation by the program of AI/ML algorithms.\nThe champion, GaTech EIC Lab, obtained 0.972 in \\(F_\\beta\\) (F1 score with a higher weight to recall), 1.747 ms in latency, and 26.39 kB in memory footprint with a deep neural network. An ICD with an on-device VA detection algorithm was implanted in a clinical trial.\n\n\n\n\n\n\nExercise 19.2: Clinical Data: Unlocking Insights with Named Entity Recognition\n\n\n\n\n\nIn this exercise, you’ll learn about Named Entity Recognition (NER), a powerful tool for extracting valuable information from clinical text. Using Spark NLP, a specialized library for healthcare NLP, we’ll explore how NER models like BiLSTM-CNN-Char and BERT can automatically identify important medical entities such as diagnoses, medications, test results, and more. You’ll get hands-on experience applying these techniques with a special focus on oncology-related data extraction, helping you unlock insights about cancer types and treatment details from patient records.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2046,7 +2046,7 @@ "section": "19.4 Science", "text": "19.4 Science\nIn many scientific fields, researchers are limited by the quality and resolution of data they can collect. They often must indirectly infer the true parameters of interest using approximate correlations and models built on sparse data points. This constrains the accuracy of scientific understanding and predictions.\nThe emergence of TinyML opens new possibilities for gathering high-fidelity scientific measurements. With embedded machine learning, tiny, low-cost sensors can automatically process and analyze data locally in real-time. This creates intelligent sensor networks that capture nuanced data at much greater scales and frequencies.\nFor example, monitoring environmental conditions to model climate change remains challenging due to the need for widespread, continuous data. The Ribbit Project from UC Berkeley is pioneering a crowdsourced TinyML solution (Rao 2021). They developed an open-source CO2 sensor that uses an onboard microcontroller to process the gas measurements. An extensive dataset can be aggregated by distributing hundreds of these low-cost sensors. The TinyML devices compensate for environmental factors and provide previously impossible, granular, accurate readings.\n\nRao, Ravi. 2021. Www.wevolver.com. https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies.\nThe potential to massively scale out intelligent sensing via TinyML has profound scientific implications. Higher-resolution data can lead to discoveries and predictive capabilities in fields ranging from ecology to cosmology. Other applications could include seismic sensors for earthquake early warning systems, distributed weather monitors to track microclimate changes, and acoustic sensors to study animal populations.\nAs sensors and algorithms continue improving, TinyML networks may generate more detailed maps of natural systems than ever before. Democratizing the collection of scientific data can accelerate research and understanding across disciplines. However, it raises new challenges around data quality, privacy, and modeling unknowns. TinyML signifies a growing convergence of AI and the natural sciences to answer fundamental questions.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2057,7 +2057,7 @@ "section": "19.5 Conservation and Environment", "text": "19.5 Conservation and Environment\nTinyML is emerging as a powerful tool for environmental conservation and sustainability efforts. Recent research has highlighted numerous applications of tiny machine learning in domains such as wildlife monitoring, natural resource management, and tracking climate change.\nOne example is using TinyML for real-time wildlife tracking and protection. Researchers have developed Smart Wildlife Tracker devices that leverage TinyML algorithms to detect poaching activities. The collars contain sensors like cameras, microphones, and GPS to monitor the surrounding environment continuously. Embedded machine learning models analyze the audio and visual data to identify threats like nearby humans or gunshots. Early poaching detection gives wildlife rangers critical information to intervene and take action.\nOther projects apply TinyML to study animal behavior through sensors. The smart wildlife collar uses accelerometers and acoustic monitoring to track elephant movements, communication, and moods (Verma 2022). The low-power TinyML collar devices transmit rich data on elephant activities while avoiding burdensome Battery changes. This helps researchers unobtrusively observe elephant populations to inform conservation strategies.\n\nVerma, Team Dual_Boot: Swapnil. 2022. “Elephant AI.” Hackster.io. https://www.hackster.io/dual\\_boot/elephant-ai-ba71e9.\nOn a broader scale, distributed TinyML devices are envisioned to create dense sensor networks for environmental modeling. Hundreds of low-cost air quality monitors could map pollution across cities. Underwater sensors may detect toxins and give early warning of algal blooms. Such applications underscore TinyML’s versatility in ecology, climatology, and sustainability.\nResearchers from Moulay Ismail University of Meknes in Morocco (Bamoumen et al. 2022) have published a survey on how TinyML can be used to solve environmental issues. However, thoughtfully assessing benefits, risks, and equitable Access will be vital as TinyML expands environmental research and conservation. With ethical consideration of impacts, TinyML offers data-driven solutions to protect biodiversity, natural resources, and our planet.\n\nBamoumen, Hatim, Anas Temouden, Nabil Benamar, and Yousra Chtouki. 2022. “How TinyML Can Be Leveraged to Solve Environmental Problems: A Survey.” In 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 338–43. IEEE; IEEE. https://doi.org/10.1109/3ict56508.2022.9990661.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2066,9 +2066,9 @@ "href": "contents/ai_for_good/ai_for_good.html#disaster-response", "title": "19  AI for Good", "section": "19.6 Disaster Response", - "text": "19.6 Disaster Response\nIn disaster response, speed and safety are paramount. But rubble and wreckage create hazardous, confined environments that impede human search efforts. TinyML enables nimble drones to assist rescue teams in these dangerous scenarios.\nWhen buildings collapse after earthquakes, small drones can prove invaluable. Equipped with TinyML navigation algorithms, micro-sized drones like the CrazyFlie can traverse cramped voids and map pathways beyond human reach (Bardienus P. Duisterhof et al. 2019). Obstacle avoidance allows the drones to weave through unstable debris. This autonomous mobility lets them rapidly sweep areas humans cannot access.\n\nDuisterhof, Bardienus P, Srivatsan Krishnan, Jonathan J Cruz, Colby R Banbury, William Fu, Aleksandra Faust, Guido CHE de Croon, and Vijay Janapa Reddi. 2019. “Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller.” ArXiv Preprint abs/1909.11236. https://arxiv.org/abs/1909.11236.\nThe video below presents the (Bardienus P. Duisterhof et al. 2019) paper on deep reinforcement learning using drones for source-seeking.\n\nCrucially, onboard sensors and TinyML processors analyze real-time data to identify signs of survivors. Thermal cameras detect body heat, microphones pick up calls for help, and gas sensors warn of leaks (Bardienus P. Duisterhof et al. 2021). Processing data locally using TinyML allows for quick interpretation to guide rescue efforts. As conditions evolve, the drones can adapt by adjusting their search patterns and priorities.\nThe following video is an overview of autonomous drones for gas leak detection.\n\nAdditionally, coordinated swarms of drones unlock new capabilities. By collaborating and sharing insights, drone teams comprehensively view the situation. Blanketing disaster sites allows TinyML algorithms to fuse and analyze data from multiple vantage points, amplifying situational awareness beyond individual drones (Bardienus P. Duisterhof et al. 2021).\n\nDuisterhof, Bardienus P., Shushuai Li, Javier Burgues, Vijay Janapa Reddi, and Guido C. H. E. de Croon. 2021. “Sniffy Bug: A Fully Autonomous Swarm of Gas-Seeking Nano Quadcopters in Cluttered Environments.” In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9099–9106. IEEE; IEEE. https://doi.org/10.1109/iros51168.2021.9636217.\nMost importantly, initial drone reconnaissance enhances safety for human responders. Keeping rescue teams at a safe distance until drone surveys assess hazards saves lives. Once secured, drones can guide precise personnel placement.\nBy combining agile mobility, real-time data, and swarm coordination, TinyML-enabled drones promise to transform disaster response. Their versatility, speed, and safety make them a vital asset for rescue efforts in dangerous, inaccessible environments. Integrating autonomous drones with traditional methods can accelerate responses when it matters most.", + "text": "19.6 Disaster Response\nIn disaster response, speed and safety are paramount. But rubble and wreckage create hazardous, confined environments that impede human search efforts. TinyML enables nimble drones to assist rescue teams in these dangerous scenarios.\nWhen buildings collapse after earthquakes, small drones can prove invaluable. Equipped with TinyML navigation algorithms, micro-sized drones like the CrazyFlie can traverse cramped voids and map pathways beyond human reach (Bardienus P. Duisterhof et al. 2019). Obstacle avoidance allows the drones to weave through unstable debris. This autonomous mobility lets them rapidly sweep areas humans cannot access. Video 19.1 presents the (Bardienus P. Duisterhof et al. 2019) paper on deep reinforcement learning using drones for source-seeking.\n\nDuisterhof, Bardienus P, Srivatsan Krishnan, Jonathan J Cruz, Colby R Banbury, William Fu, Aleksandra Faust, Guido CHE de Croon, and Vijay Janapa Reddi. 2019. “Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller.” ArXiv Preprint abs/1909.11236. https://arxiv.org/abs/1909.11236.\n\n\n\n\n\n\nVideo 19.1: Learning to Seek\n\n\n\n\n\n\nCrucially, onboard sensors and TinyML processors analyze real-time data to identify signs of survivors. Thermal cameras detect body heat, microphones pick up calls for help, and gas sensors warn of leaks (Bardienus P. Duisterhof et al. 2021). Processing data locally using TinyML allows for quick interpretation to guide rescue efforts. As conditions evolve, the drones can adapt by adjusting their search patterns and priorities. Video 19.2 is an overview of autonomous drones for gas leak detection.\n\n\n\n\n\n\nVideo 19.2\n\n\n\n\n\n\nAdditionally, coordinated swarms of drones unlock new capabilities. By collaborating and sharing insights, drone teams comprehensively view the situation. Blanketing disaster sites allows TinyML algorithms to fuse and analyze data from multiple vantage points, amplifying situational awareness beyond individual drones (Bardienus P. Duisterhof et al. 2021).\n\nDuisterhof, Bardienus P., Shushuai Li, Javier Burgues, Vijay Janapa Reddi, and Guido C. H. E. de Croon. 2021. “Sniffy Bug: A Fully Autonomous Swarm of Gas-Seeking Nano Quadcopters in Cluttered Environments.” In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9099–9106. IEEE; IEEE. https://doi.org/10.1109/iros51168.2021.9636217.\nMost importantly, initial drone reconnaissance enhances safety for human responders. Keeping rescue teams at a safe distance until drone surveys assess hazards saves lives. Once secured, drones can guide precise personnel placement.\nBy combining agile mobility, real-time data, and swarm coordination, TinyML-enabled drones promise to transform disaster response. Their versatility, speed, and safety make them a vital asset for rescue efforts in dangerous, inaccessible environments. Integrating autonomous drones with traditional methods can accelerate responses when it matters most.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2079,7 +2079,7 @@ "section": "19.7 Education and Outreach", "text": "19.7 Education and Outreach\nTinyML holds immense potential to help address challenges in developing regions, but realizing its benefits requires focused education and capacity building. Recognizing this need, academic researchers have spearheaded outreach initiatives to spread TinyML education globally.\nIn 2020, Harvard University, Columbia University, the International Centre for Theoretical Physics (ICTP), and UNIFEI jointly founded the TinyML for Developing Communities (TinyML4D) network (Zennaro, Plancher, and Reddi 2022). This network empowers universities and researchers in developing countries to harness TinyML for local impact.\n\nZennaro, Marco, Brian Plancher, and V Janapa Reddi. 2022. “TinyML: Applied AI for Development.” In The UN 7th Multi-Stakeholder Forum on Science, Technology and Innovation for the Sustainable Development Goals, 2022–05.\nA core focus is expanding Access to applied machine learning education. The TinyML4D network provides training, curricula, and lab resources to members. Hands-on workshops and data collection projects give students practical experience. Members can share best practices and build a community through conferences and academic collaborations.\nThe network prioritizes enabling locally relevant TinyML solutions. Projects address challenges like agriculture, health, and environmental monitoring based on community needs. For example, a member university in Rwanda developed a low-cost flood monitoring system using TinyML and sensors.\nTinyML4D includes over 50 member institutions across Africa, Asia, and Latin America. However, greater investments and industry partnerships are needed to reach all underserved regions. The ultimate vision is training new generations to ethically apply TinyML for sustainable development. Outreach efforts today lay the foundation for democratizing transformative technology for the future.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2088,9 +2088,9 @@ "href": "contents/ai_for_good/ai_for_good.html#accessibility", "title": "19  AI for Good", "section": "19.8 Accessibility", - "text": "19.8 Accessibility\nTechnology has immense potential to break down barriers faced by people with disabilities and bridge gaps in accessibility. TinyML specifically opens new possibilities for developing intelligent, personalized assistive devices.\nWith machine learning algorithms running locally on microcontrollers, compact accessibility tools can operate in real time without reliance on connectivity. The National Institute on Deafness and Other Communication Disorders (NIDCD) states that 20% of the world’s population has some form of hearing loss. Hearing aids leveraging TinyML could recognize multiple speakers and amplify the voice of a chosen target in crowded rooms. This allows people with hearing impairments to focus on specific conversations.\nSimilarly, mobility devices could use on-device vision processing to identify obstacles and terrain characteristics. This enables enhanced navigation and safety for the visually impaired. Companies like Envision are developing smart glasses, converting visual information into speech, with embedded TinyML to guide blind people by detecting objects, text, and traffic signals.\nThe video below shows the different real-life use cases of the Envision visual aid glasses.\n\nTinyML could even power responsive prosthetic limbs. By analyzing nerve signals and sensory data like muscle tension, prosthetics and exoskeletons with embedded ML can move and adjust grip dynamically, making control more natural and intuitive. Companies are creating affordable, everyday bionic hands using TinyML. For those with speech difficulties, voice-enabled devices with TinyML can generate personalized vocal outputs from non-verbal inputs. Pairs by Anthropic translates gestures into natural speech tailored for individual users.\nBy enabling more customizable assistive tech, TinyML makes services more accessible and tailored to individual needs. And through translation and interpretation applications, TinyML can break down communication barriers. Apps like Microsoft Translator offer real-time translation powered by TinyML algorithms.\nWith its thoughtful and inclusive design, TinyML promises more autonomy and dignity for people with disabilities. However, developers should engage communities directly, avoid compromising privacy, and consider affordability to maximize the benefits. TinyML has huge potential to contribute to a more just, equitable world.", + "text": "19.8 Accessibility\nTechnology has immense potential to break down barriers faced by people with disabilities and bridge gaps in accessibility. TinyML specifically opens new possibilities for developing intelligent, personalized assistive devices.\nWith machine learning algorithms running locally on microcontrollers, compact accessibility tools can operate in real time without reliance on connectivity. The National Institute on Deafness and Other Communication Disorders (NIDCD) states that 20% of the world’s population has some form of hearing loss. Hearing aids leveraging TinyML could recognize multiple speakers and amplify the voice of a chosen target in crowded rooms. This allows people with hearing impairments to focus on specific conversations.\nSimilarly, mobility devices could use on-device vision processing to identify obstacles and terrain characteristics. This enables enhanced navigation and safety for the visually impaired. Companies like Envision are developing smart glasses, converting visual information into speech, with embedded TinyML to guide blind people by detecting objects, text, and traffic signals. Video 19.3 below shows the different real-life use cases of the Envision visual aid glasses.\n\n\n\n\n\n\nVideo 19.3\n\n\n\n\n\n\nTinyML could even power responsive prosthetic limbs. By analyzing nerve signals and sensory data like muscle tension, prosthetics and exoskeletons with embedded ML can move and adjust grip dynamically, making control more natural and intuitive. Companies are creating affordable, everyday bionic hands using TinyML. For those with speech difficulties, voice-enabled devices with TinyML can generate personalized vocal outputs from non-verbal inputs. Pairs by Anthropic translates gestures into natural speech tailored for individual users.\nBy enabling more customizable assistive tech, TinyML makes services more accessible and tailored to individual needs. And through translation and interpretation applications, TinyML can break down communication barriers. Apps like Microsoft Translator offer real-time translation powered by TinyML algorithms.\nWith its thoughtful and inclusive design, TinyML promises more autonomy and dignity for people with disabilities. However, developers should engage communities directly, avoid compromising privacy, and consider affordability to maximize the benefits. TinyML has huge potential to contribute to a more just, equitable world.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2101,7 +2101,7 @@ "section": "19.9 Infrastructure and Urban Planning", "text": "19.9 Infrastructure and Urban Planning\nAs urban populations swell, cities face immense challenges in efficiently managing resources and infrastructure. TinyML presents a powerful tool for developing intelligent systems to optimize city operations and sustainability. It could revolutionize energy efficiency in smart buildings.\nMachine learning models can learn to predict and regulate energy usage based on occupancy patterns. Miniaturized sensors placed throughout buildings can provide granular, real-time data on space utilization, temperature, and more (Seyedzadeh et al. 2018). This visibility allows TinyML systems to minimize waste by optimizing heating, cooling, lighting, etc.\n\nSeyedzadeh, Saleh, Farzad Pour Rahimian, Ivan Glesk, and Marc Roper. 2018. “Machine Learning for Estimation of Building Energy Consumption and Performance: A Review.” Visualization in Engineering 6 (1): 1–20. https://doi.org/10.1186/s40327-018-0064-7.\nThese examples demonstrate TinyML’s huge potential for efficient, sustainable city infrastructure. However, urban planners must consider privacy, security, and accessibility to ensure responsible adoption. With careful implementation, TinyML could profoundly modernize urban life.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2112,7 +2112,7 @@ "section": "19.10 Challenges and Considerations", "text": "19.10 Challenges and Considerations\nWhile TinyML presents immense opportunities, thoughtful consideration of challenges and ethical implications will be critical as adoption spreads globally. Researchers have highlighted key factors to address, especially when deploying TinyML in developing regions.\nA foremost challenge is limited Access to training and hardware (Ooko et al. 2021). Only educational programs exist tailored to TinyML, and emerging economies often need a robust electronics supply chain. Thorough training and partnerships will be needed to nurture expertise and make devices available to underserved communities. Initiatives like the TinyML4D network help provide structured learning pathways.\n\nOoko, Samson Otieno, Marvin Muyonga Ogore, Jimmy Nsenga, and Marco Zennaro. 2021. “TinyML in Africa: Opportunities and Challenges.” In 2021 IEEE Globecom Workshops (GC Wkshps), 1–6. IEEE; IEEE. https://doi.org/10.1109/gcwkshps52748.2021.9682107.\nData limitations also pose hurdles. TinyML models require quality localized datasets, which are scarce in under-resourced environments. Creating frameworks to crowdsource data ethically could address this. However, data collection should benefit local communities directly, not just extract value.\nOptimizing power usage and connectivity will be vital for sustainability. TinyML’s low power needs make it ideal for off-grid use cases. Integrating battery or solar can enable continuous operation. Adapting devices for low-bandwidth transmission where the internet is limited also maximizes impact.\nCultural and language barriers further complicate adoption. User interfaces and devices should account for all literacy levels and avoid excluding subgroups. Voice-controllable solutions in local dialects can enhance accessibility.\nAddressing these challenges requires holistic partnerships, funding, and policy support. However, inclusively and ethically scaling TinyML has monumental potential to uplift disadvantaged populations worldwide. With thoughtful implementation, the technology could profoundly democratize opportunity.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2123,7 +2123,7 @@ "section": "19.11 Conclusion", "text": "19.11 Conclusion\nTinyML presents a tremendous opportunity to harness the power of artificial intelligence to advance the UN Sustainable Development Goals and drive social impact globally, as highlighted by examples across sectors like healthcare, agriculture, conservation, and more; embedded machine learning unlocks new capabilities for low-cost, accessible solutions tailored to local contexts. TinyML circumvents barriers like poor infrastructure, limited connectivity, and high costs that often exclude developing communities from emerging technology.\nHowever, realizing TinyML’s full potential requires holistic collaboration. Researchers, policymakers, companies, and local stakeholders must collaborate to provide training, establish ethical frameworks, co-design solutions, and adapt them to community needs. Through inclusive development and deployment, TinyML can deliver on its promise to bridge inequities and uplift vulnerable populations without leaving any behind.\nIf cultivated responsibly, TinyML could democratize opportunity and accelerate progress on global priorities from poverty alleviation to climate resilience. The technology represents a new wave of applied AI to empower societies, promote sustainability, and propel humanity toward greater justice, prosperity, and peace. TinyML provides a glimpse into an AI-enabled future that is accessible to all.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2132,9 +2132,9 @@ "href": "contents/ai_for_good/ai_for_good.html#sec-ai-for-good-resource", "title": "19  AI for Good", "section": "19.12 Resources", - "text": "19.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nTinyML for Social Impact.\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\n\nExercise 19.1\nExercise 19.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", + "text": "19.12 Resources\nHere is a curated list of resources to support students and instructors in their learning and teaching journeys. We are continuously working on expanding this collection and will be adding new exercises soon.\n\n\n\n\n\n\nSlides\n\n\n\n\n\nThese slides are a valuable tool for instructors to deliver lectures and for students to review the material at their own pace. We encourage students and instructors to leverage these slides to enhance their understanding and facilitate effective knowledge transfer.\n\nTinyML for Social Impact.\n\n\n\n\n\n\n\n\n\n\nVideos\n\n\n\n\n\n\nVideo 19.1\nVideo 19.2\nVideo 19.3\n\n\n\n\n\n\n\n\n\n\nExercises\n\n\n\n\n\n\nExercise 19.1\nExercise 19.2\n\n\n\n\n\n\n\n\n\n\nLabs\n\n\n\n\n\nIn addition to exercises, we offer a series of hands-on labs allowing students to gain practical experience with embedded AI technologies. These labs provide step-by-step guidance, enabling students to develop their skills in a structured and supportive environment. We are excited to announce that new labs will be available soon, further enriching the learning experience.\n\nComing soon.", "crumbs": [ - "MAIN", + "Impact", "19  AI for Good" ] }, @@ -2145,7 +2145,7 @@ "section": "", "text": "20.1 Introduction\nThis book examines the rapidly evolving field of ML systems (Chapter 2). We focus on systems because while there are many resources on ML models and algorithms, more needs to be understood about how to build the systems that run them.\nTo draw an analogy, consider the process of building a car. While many resources are available on the various components of a car, such as the engine, transmission, and suspension, there is often a need for more understanding about how to assemble these components into a functional vehicle. Just as a car requires a well-designed and properly integrated system to operate efficiently and reliably, ML models also require a robust and carefully constructed system to deliver their full potential. Moreover, there is a lot of nuance in building ML systems, given their specific use case. For example, a Formula 1 race car must be assembled differently from an everyday Prius consumer car.\nOur journey started by tracing ML’s historical trajectory, from its theoretical foundations to its current state as a transformative force across industries (Chapter 3). This journey has highlighted the remarkable progress in the field, challenges, and opportunities.\nThroughout this book, we have looked into the intricacies of ML systems, examining the critical components and best practices necessary to create a seamless and efficient pipeline. From data preprocessing and model training to deployment and monitoring, we have provided insights and guidance to help readers navigate the complex landscape of ML system development.\nML systems involve complex workflows, spanning various topics from data engineering to model deployment on diverse systems (Chapter 4). By providing an overview of these ML system components, we have aimed to showcase the tremendous depth and breadth of the field and expertise that is needed. Understanding the intricacies of ML workflows is crucial for practitioners and researchers alike, as it enables them to navigate the landscape effectively and develop robust, efficient, and impactful ML solutions.\nBy focusing on the systems aspect of ML, we aim to bridge the gap between theoretical knowledge and practical implementation. Just as a healthy human body system allows the organs to function optimally, a well-designed ML system enables the models to consistently deliver accurate and reliable results. This book aims to empower readers with the knowledge and tools necessary to build ML systems that showcase the underlying models’ power and ensure smooth integration and operation, much like a well-functioning human body.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2156,7 +2156,7 @@ "section": "20.2 Knowing the Importance of ML Datasets", "text": "20.2 Knowing the Importance of ML Datasets\nOne of the key things we have emphasized is that data is the foundation upon which ML systems are built (Chapter 5). Data is the new code that programs deep neural networks, making data engineering the first and most critical stage of any ML pipeline. That is why we began our exploration by diving into the basics of data engineering, recognizing that quality, diversity, and ethical sourcing are key to building robust and reliable machine learning models.\nThe importance of high-quality data must be balanced. Lapses in data quality can lead to significant negative consequences, such as flawed predictions, project terminations, and even potential harm to communities. These cascading effects, often called “Data Cascades,” highlight the need for diligent data management and governance practices. ML practitioners must prioritize data quality, ensure diversity and representativeness, and adhere to ethical data collection and usage standards. By doing so, we can mitigate the risks associated with poor data quality and build ML systems that are trustworthy, reliable, and beneficial to society.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2167,7 +2167,7 @@ "section": "20.3 Navigating the AI Framework Landscape", "text": "20.3 Navigating the AI Framework Landscape\nThere are many different ML frameworks. Therefore, we dove into the evolution of different ML frameworks, dissecting the inner workings of popular ones like TensorFlow and PyTorch, and provided insights into the core components and advanced features that define them (Chapter 6). We also looked into the specialization of frameworks tailored to specific needs, such as those designed for embedded AI. We discussed the criteria for selecting the most suitable framework for a given project.\nOur exploration also touched upon the future trends expected to shape the landscape of ML frameworks in the coming years. As the field continues to evolve, we can anticipate the emergence of more specialized and optimized frameworks that cater to the unique requirements of different domains and deployment scenarios, as we saw with TensorFlow Lite for Microcontrollers. By staying abreast of these developments and understanding the tradeoffs involved in framework selection, we can make informed decisions and leverage the most appropriate tools to build efficient ML systems.\nMoreover, we expect to see a growing emphasis on framework interoperability and standardization efforts, such as the ONNX (Open Neural Network Exchange) format. This format allows models to be trained in one framework and deployed in another, facilitating greater collaboration and portability across different platforms and environments.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2178,7 +2178,7 @@ "section": "20.4 Understanding ML Training Fundamentals", "text": "20.4 Understanding ML Training Fundamentals\nAs ML practitioners who build ML systems, it is crucial to deeply understand the AI training process and the system challenges in scaling and optimizing it. By leveraging the capabilities of modern AI frameworks and staying up-to-date with the latest advancements in training techniques, we can build robust, efficient, and scalable ML systems that can tackle real-world problems and drive innovation across various domains.\nWe began by examining the fundamentals of AI training (Chapter 7), which involves feeding data into ML models and adjusting their parameters to minimize the difference between predicted and actual outputs. This process is computationally intensive and requires careful consideration of various factors, such as the choice of optimization algorithms, learning rate, batch size, and regularization techniques. Understanding these concepts is crucial for developing effective and efficient training pipelines.\nHowever, training ML models at scale poses significant system challenges. As datasets’ size and models’ complexity grow, the computational resources required for training can become prohibitively expensive. This has led to the development of distributed training techniques, such as data and model parallelism, which allow multiple devices to collaborate in the training process. Frameworks like TensorFlow and PyTorch have evolved to support these distributed training paradigms, enabling practitioners to scale their training workloads across clusters of GPUs or TPUs.\nIn addition to distributed training, we discussed techniques for optimizing the training process, such as mixed-precision training and gradient compression. It’s important to note that while these techniques may seem algorithmic, they significantly impact system performance. The choice of training algorithms, precision, and communication strategies directly affects the ML system’s resource utilization, scalability, and efficiency. Therefore, adopting an algorithm-hardware or algorithm-system co-design approach is crucial, where the algorithmic choices are made in tandem with the system considerations. By understanding the interplay between algorithms and hardware, we can make informed decisions that optimize the model performance and the system efficiency, ultimately leading to more effective and scalable ML solutions.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2189,7 +2189,7 @@ "section": "20.5 Pursuing Efficiency in AI Systems", "text": "20.5 Pursuing Efficiency in AI Systems\nDeploying trained ML models is more complex than simply running the networks; efficiency is critical (Chapter 8). In this chapter on AI efficiency, we emphasized that efficiency is not merely a luxury but a necessity in artificial intelligence systems. We dug into the key concepts underpinning AI systems’ efficiency, recognizing that the computational demands on neural networks can be daunting, even for minimal systems. For AI to be seamlessly integrated into everyday devices and essential systems, it must perform optimally within the constraints of limited resources while maintaining its efficacy.\nThroughout the book, we have highlighted the importance of pursuing efficiency to ensure that AI models are streamlined, rapid, and sustainable. By optimizing models for efficiency, we can widen their applicability across various platforms and scenarios, enabling AI to be deployed in resource-constrained environments such as embedded systems and edge devices. This pursuit of efficiency is crucial for the widespread adoption and practical implementation of AI technologies in real-world applications.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2200,7 +2200,7 @@ "section": "20.6 Optimizing ML Model Architectures", "text": "20.6 Optimizing ML Model Architectures\nWe then explored various model architectures, from the foundational perceptron to the sophisticated transformer networks, each tailored to specific tasks and data types. This exploration has showcased machine learning models’ remarkable diversity and adaptability, enabling them to tackle various problems across domains.\nHowever, when deploying these models on systems, especially resource-constrained embedded systems, model optimization becomes a necessity. The evolution of model architectures, from the early MobileNets designed for mobile devices to the more recent TinyML models optimized for microcontrollers, is a testament to the continued innovation.\nIn the chapter on model optimization (Chapter 9), we looked into the art and science of optimizing machine learning models to ensure they are lightweight, efficient, and effective when deployed in TinyML scenarios. We explored techniques such as model compression, quantization, and architecture search, which allow us to reduce the computational footprint of models while maintaining their performance. By applying these optimization techniques, we can create models tailored to the specific constraints of embedded systems, enabling the deployment of powerful AI capabilities on edge devices. This opens many possibilities for intelligent, real-time processing and decision-making in IoT, robotics, and mobile computing applications. As we continue pushing the boundaries of AI efficiency, we expect to see even more innovative solutions for deploying machine learning models in resource-constrained environments.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2211,7 +2211,7 @@ "section": "20.7 Advancing AI Processing Hardware", "text": "20.7 Advancing AI Processing Hardware\nOver the years, we have witnessed remarkable strides in ML hardware, driven by the insatiable demand for computational power and the need to address the challenges of resource constraints in real-world deployments (Chapter 10). These advancements have been crucial in enabling the deployment of powerful AI capabilities on devices with limited resources, opening up new possibilities across various industries.\nSpecialized hardware acceleration is essential to overcome these constraints and enable high-performance machine learning. Hardware accelerators, such as GPUs, FPGAs, and ASICs, optimize compute-intensive operations, particularly inference, by leveraging custom silicon designed for efficient matrix multiplications. These accelerators provide substantial speedups compared to general-purpose CPUs, enabling real-time execution of advanced ML models on devices with strict size, weight, and power limitations.\nWe have also explored the various techniques and approaches for hardware acceleration in embedded machine-learning systems. We discussed the tradeoffs in selecting the appropriate hardware for specific use cases and the importance of software optimizations to harness these accelerators’ capabilities fully. By understanding these concepts, ML practitioners can make informed decisions when designing and deploying ML systems.\nGiven the plethora of ML hardware solutions available, benchmarking has become essential to developing and deploying machine learning systems (Chapter 11). Benchmarking allows developers to measure and compare the performance of different hardware platforms, model architectures, training procedures, and deployment strategies. By utilizing well-established benchmarks like MLPerf, practitioners gain valuable insights into the most effective approaches for a given problem, considering the unique constraints of the target deployment environment.\nAdvancements in ML hardware, combined with insights gained from benchmarking and optimization techniques, have paved the way for successfully deploying machine learning capabilities on various devices, from powerful edge servers to resource-constrained microcontrollers. As the field continues to evolve, we expect to see even more innovative hardware solutions and benchmarking approaches that will further push the boundaries of what is possible with embedded machine learning systems.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2222,7 +2222,7 @@ "section": "20.8 Embracing On-Device Learning", "text": "20.8 Embracing On-Device Learning\nIn addition to the advancements in ML hardware, we also explored on-device learning, where models can adapt and learn directly on the device (Chapter 12). This approach has significant implications for data privacy and security, as sensitive information can be processed locally without the need for transmission to external servers.\nOn-device learning enhances privacy by keeping data within the confines of the device, reducing the risk of unauthorized access or data breaches. It also reduces reliance on cloud connectivity, enabling ML models to function effectively even in scenarios with limited or intermittent internet access. We have discussed techniques such as transfer learning and federated learning, which have expanded the capabilities of on-device learning. Transfer learning allows models to leverage knowledge gained from one task or domain to improve performance on another, enabling more efficient and effective learning on resource-constrained devices. On the other hand, Federated learning enables collaborative model updates across distributed devices without centralized data aggregation. This approach allows multiple devices to contribute to learning while keeping their data locally, enhancing privacy and security.\nThese advancements in on-device learning have paved the way for more secure, privacy-preserving, and decentralized machine learning applications. As we prioritize data privacy and security in developing ML systems, we expect to see more innovative solutions that enable powerful AI capabilities while protecting sensitive information and ensuring user privacy.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2233,7 +2233,7 @@ "section": "20.9 Streamlining ML Operations", "text": "20.9 Streamlining ML Operations\nEven if we got the above pieces right, challenges and considerations must be addressed to ensure ML models’ successful integration and operation in production environments. In the ML Ops chapter (Chapter 13), we studied the practices and architectures necessary to develop, deploy, and manage ML models throughout their entire lifecycle. We looked at the phases of ML, from data collection and model training to evaluation, deployment, and ongoing monitoring.\nWe learned about the importance of automation, collaboration, and continuous improvement in ML Ops. By automating key processes, teams can streamline their workflows, reduce manual errors, and accelerate the deployment of ML models. Collaboration among diverse teams, including data scientists, engineers, and domain experts, ensures ML systems’ successful development and deployment.\nThe ultimate goal of this chapter was to provide readers with a comprehensive understanding of ML model management, equipping them with the knowledge and tools necessary to build and run ML applications that deliver sustained value successfully. By adopting best practices in ML Ops, organizations can ensure their ML initiatives’ long-term success and impact, driving innovation and delivering meaningful results.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2244,7 +2244,7 @@ "section": "20.10 Ensuring Security and Privacy", "text": "20.10 Ensuring Security and Privacy\nNo ML system is ever complete without thinking about security and privacy. They are of major importance when developing real-world ML systems. As machine learning finds increasing application in sensitive domains such as healthcare, finance, and personal data, safeguarding confidentiality and preventing the misuse of data and models becomes a critical imperative, and these were the concepts we discussed previously (Chapter 14).\nTo build robust and responsible ML systems, practitioners must thoroughly understand the potential security and privacy risks. These risks include data leaks, which can expose sensitive information; model theft, where malicious actors steal trained models; adversarial attacks that can manipulate model behavior; bias in models that can lead to unfair or discriminatory outcomes; and unintended access to private information.\nMitigating these risks requires a deep understanding of best practices in security and privacy. Therefore, we have emphasized that security and privacy cannot be an afterthought—they must be proactively addressed at every stage of the ML system development lifecycle. From the initial stages of data collection and labeling, it is crucial to ensure that data is handled securely and that privacy is protected. During model training and evaluation, techniques such as differential privacy and secure multi-party computation can be employed to safeguard sensitive information.\nWhen deploying ML models, robust access controls, encryption, and monitoring mechanisms must be implemented to prevent unauthorized access and detect potential security breaches. Ongoing monitoring and auditing of ML systems as part of MLOps are also essential to identify and address emerging security or privacy vulnerabilities.\nBy embedding security and privacy considerations into each stage of building, deploying, and managing ML systems, we can safely unlock the benefits of AI while protecting individuals’ rights and ensuring the responsible use of these powerful technologies. Only through this proactive and comprehensive approach can we build ML systems that are not only technologically advanced but also ethically sound and worthy of public trust.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2255,7 +2255,7 @@ "section": "20.11 Upholding Ethical Considerations", "text": "20.11 Upholding Ethical Considerations\nAs we embrace ML advancements in all facets of our lives, it is crucial to remain mindful of the ethical considerations that will shape the future of AI (Chapter 15). Fairness, transparency, accountability, and privacy in AI systems will be paramount as they become more integrated into our lives and decision-making processes.\nAs AI systems become more pervasive and influential, it is essential to ensure that they are designed and deployed in a manner that upholds ethical principles. This means actively mitigating biases, promoting fairness, and preventing discriminatory outcomes. It also ensures transparency in how AI systems make decisions, enabling users to understand and trust their outputs.\nAccountability is another critical ethical consideration. As AI systems take on more responsibilities and make decisions that impact individuals and society, there must be clear mechanisms for holding these systems and their creators accountable. This includes establishing frameworks for auditing and monitoring AI systems and defining liability and redress mechanisms in case of harm or unintended consequences.\nEthical frameworks, regulations, and standards will be essential to address these ethical challenges. These frameworks should guide the responsible development and deployment of AI technologies, ensuring that they align with societal values and promote the well-being of individuals and communities.\nMoreover, ongoing discussions and collaborations among researchers, practitioners, policymakers, and society will be crucial in navigating the ethical landscape of AI. These conversations should be inclusive and diverse, bringing together different perspectives and expertise to develop comprehensive and equitable solutions. As we move forward, it is the collective responsibility of all stakeholders to prioritize ethical considerations in the development and deployment of AI systems.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2266,7 +2266,7 @@ "section": "20.12 Promoting Sustainability and Equity", "text": "20.12 Promoting Sustainability and Equity\nThe increasing computational demands of machine learning, particularly for training large models, have raised concerns about their environmental impact due to high energy consumption and carbon emissions (Chapter 16). As the scale and complexity of models continue to grow, addressing the sustainability challenges associated with AI development becomes imperative. To mitigate the environmental footprint of AI, the development of energy-efficient algorithms is crucial. This involves optimizing models and training procedures to minimize computational requirements while maintaining performance. Techniques such as model compression, quantization, and efficient neural architecture search can help reduce the energy consumption of AI systems.\nUsing renewable energy sources to power AI infrastructure is another important step towards sustainability. By transitioning to clean energy sources such as solar, wind, and hydropower, the carbon emissions associated with AI development can be significantly reduced. This requires a concerted effort from the AI community and support from policymakers and industry leaders to invest in and adopt renewable energy solutions. In addition, exploring alternative computing paradigms, such as neuromorphic and photonic computing, holds promise for developing more energy-efficient AI systems. By developing hardware and algorithms that emulate the brain’s processing mechanisms, we can potentially create AI systems that are both powerful and sustainable.\nThe AI community must prioritize sustainability as a key consideration in research and development. This involves investing in green computing initiatives, such as developing energy-efficient hardware and optimizing data centers for reduced energy consumption. It also requires collaboration across disciplines, bringing together AI, energy, and sustainability experts to develop holistic solutions.\nMoreover, it is important to acknowledge that access to AI and machine learning compute resources may not be equally distributed across organizations and regions. This disparity can lead to a widening gap between those who have the means to leverage advanced AI technologies and those who do not. Organizations like the Organisation for Economic Cooperation and Development (OECD) are actively exploring ways to address this issue and promote greater equity in AI access and adoption. By fostering international cooperation, sharing best practices, and supporting capacity-building initiatives, we can ensure that AI’s benefits are more widely accessible and that no one is left behind in the AI revolution.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2277,7 +2277,7 @@ "section": "20.13 Enhancing Robustness and Resiliency", "text": "20.13 Enhancing Robustness and Resiliency\nThe chapter on Robust AI dives into the fundamental concepts, techniques, and tools for building fault-tolerant and error-resilient ML systems (Chapter 17). In that chapter, we explored how robust AI techniques can address the challenges posed by various types of hardware faults, including transient, permanent, and intermittent faults, as well as software issues such as bugs, design flaws, and implementation errors.\nBy employing robust AI techniques, ML systems can maintain their reliability, safety, and performance even in adverse conditions. These techniques enable systems to detect and recover from faults, adapt to changing environments, and make decisions under uncertainty.\nThe chapter empowers researchers and practitioners to develop AI solutions that can withstand the complexities and uncertainties of real-world environments. It provides insights into the design principles, architectures, and algorithms underpinning robust AI systems and practical guidance on implementing and validating these systems.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2288,7 +2288,7 @@ "section": "20.14 Shaping the Future of ML Systems", "text": "20.14 Shaping the Future of ML Systems\nAs we look to the future, the trajectory of ML systems points towards a paradigm shift from a model-centric approach to a more data-centric one. This shift recognizes that the quality and diversity of data are paramount to developing robust, reliable, and fair AI models.\nWe anticipate a growing emphasis on data curation, labeling, and augmentation techniques in the coming years. These practices aim to ensure that models are trained on high-quality, representative data that accurately reflects the complexities and nuances of real-world scenarios. By focusing on data quality and diversity, we can mitigate the risks of biased or skewed models that may perpetuate unfair or discriminatory outcomes.\nThis data-centric approach will be crucial in addressing the challenges of bias, fairness, and generalizability in ML systems. By actively seeking out and incorporating diverse and inclusive datasets, we can develop more robust, equitable, and applicable models for various contexts and populations. Moreover, the emphasis on data will drive advancements in techniques such as data augmentation, where existing datasets are expanded and diversified through data synthesis, translation, and generation. These techniques can help overcome the limitations of small or imbalanced datasets, enabling the development of more accurate and generalizable models.\nIn recent years, generative AI has taken the field by storm, demonstrating remarkable capabilities in creating realistic images, videos, and text. However, the rise of generative AI also brings new challenges for ML systems (Chapter 18). Unlike traditional ML systems, generative models often demand more computational resources and pose challenges in terms of scalability and efficiency. Furthermore, evaluating and benchmarking generative models presents difficulties, as traditional metrics used for classification tasks may not be directly applicable. Developing robust evaluation frameworks for generative models is an active area of research.\nUnderstanding and addressing these system challenges and ethical considerations will be crucial in shaping the future of generative AI and its impact on society. As ML practitioners and researchers, we are responsible for advancing the technical capabilities of generative models and developing robust systems and frameworks that can mitigate potential risks and ensure the beneficial application of this powerful technology.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2299,7 +2299,7 @@ "section": "20.15 Applying AI for Good", "text": "20.15 Applying AI for Good\nThe potential for AI to be used for social good is vast, provided that responsible ML systems are developed and deployed at scale across various use cases (Chapter 19). To realize this potential, it is essential for researchers and practitioners to actively engage in the process of learning, experimentation, and pushing the boundaries of what is possible.\nThroughout the development of ML systems, it is crucial to remember the key themes and lessons explored in this book. These include the importance of data quality and diversity, the pursuit of efficiency and robustness, the potential of TinyML and neuromorphic computing, and the imperative of security and privacy. These insights inform the work and guide the decisions of those involved in developing AI systems.\nIt is important to recognize that the development of AI is not solely a technical endeavor but also a deeply human one. It requires collaboration, empathy, and a commitment to understanding the societal implications of the systems being created. Engaging with experts from diverse fields, such as ethics, social sciences, and policy, is essential to ensure that the AI systems developed are technically sound, socially responsible, and beneficial. Embracing the opportunity to be part of this transformative field and shaping its future is a privilege and a responsibility. By working together, we can create a world where ML systems serve as tools for positive change and improving the human condition.", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2310,7 +2310,7 @@ "section": "20.16 Congratulations!", "text": "20.16 Congratulations!\nCongratulations on coming this far, and best of luck in your future endeavors! The future of AI is bright and filled with endless possibilities, and I can’t wait to see the incredible contributions you will make.\nFeel free to reach out to me anytime at vj at eecs dot harvard dot edu.\n– Prof. Vijay Janapa Reddi, Harvard University", "crumbs": [ - "MAIN", + "Closing", "20  Conclusion" ] }, @@ -2319,7 +2319,7 @@ "href": "references.html", "title": "References", "section": "", - "text": "Abadi, Martin, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya\nMironov, Kunal Talwar, and Li Zhang. 2016. “Deep Learning with\nDifferential Privacy.” In Proceedings of the 2016 ACM SIGSAC\nConference on Computer and Communications Security, 308–18. CCS\n’16. New York, NY, USA: ACM. https://doi.org/10.1145/2976749.2978318.\n\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi\nSchwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020.\n“Headless Horseman: Adversarial Attacks on Transfer\nLearning Models.” In ICASSP 2020 - 2020 IEEE International\nConference on Acoustics, Speech and Signal Processing (ICASSP),\n3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\n\nAddepalli, Sravanti, B. S. Vivek, Arya Baburaj, Gaurang Sriramanan, and\nR. Venkatesh Babu. 2020. “Towards Achieving Adversarial Robustness\nby Enforcing Feature Consistency Across Bit Planes.” In 2020\nIEEE/CVF Conference on Computer Vision and Pattern Recognition\n(CVPR), 1020–29. IEEE. https://doi.org/10.1109/cvpr42600.2020.00110.\n\n\nAdolf, Robert, Saketh Rama, Brandon Reagen, Gu-yeon Wei, and David\nBrooks. 2016. “Fathom: Reference Workloads for Modern\nDeep Learning Methods.” In 2016 IEEE International Symposium\non Workload Characterization (IISWC), 1–10. IEEE; IEEE. https://doi.org/10.1109/iiswc.2016.7581275.\n\n\nAgarwal, Alekh, Alina Beygelzimer, Miroslav Dudı́k, John Langford, and\nHanna M. Wallach. 2018. “A Reductions Approach to Fair\nClassification.” In Proceedings of the 35th International\nConference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm,\nSweden, July 10-15, 2018, edited by Jennifer G. Dy and Andreas\nKrause, 80:60–69. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/agarwal18a.html.\n\n\nAgnesina, Anthony, Puranjay Rajvanshi, Tian Yang, Geraldo Pradipta,\nAustin Jiao, Ben Keller, Brucek Khailany, and Haoxing Ren. 2023.\n“AutoDMP: Automated DREAMPlace-Based Macro\nPlacement.” In Proceedings of the 2023 International\nSymposium on Physical Design, 149–57. ACM. https://doi.org/10.1145/3569052.3578923.\n\n\nAgrawal, Dakshi, Selcuk Baktir, Deniz Karakoyunlu, Pankaj Rohatgi, and\nBerk Sunar. 2007. “Trojan Detection Using\nIC Fingerprinting.” In 2007 IEEE Symposium on\nSecurity and Privacy (SP ’07), 29–45. Springer; IEEE. https://doi.org/10.1109/sp.2007.36.\n\n\nAhmadilivani, Mohammad Hasan, Mahdi Taheri, Jaan Raik, Masoud\nDaneshtalab, and Maksim Jenihhin. 2024. “A Systematic Literature\nReview on Hardware Reliability Assessment Methods for Deep Neural\nNetworks.” ACM Comput. Surv. 56 (6): 1–39. https://doi.org/10.1145/3638242.\n\n\nAledhari, Mohammed, Rehma Razzak, Reza M. Parizi, and Fahad Saeed. 2020.\n“Federated Learning: A Survey on Enabling\nTechnologies, Protocols, and Applications.” #IEEE_O_ACC#\n8: 140699–725. https://doi.org/10.1109/access.2020.3013541.\n\n\nAlghamdi, Wael, Hsiang Hsu, Haewon Jeong, Hao Wang, Peter Michalak,\nShahab Asoodeh, and Flavio Calmon. 2022. “Beyond Adult and\nCOMPAS: Fair Multi-Class Prediction via\nInformation Projection.” Adv. Neur. In. 35: 38747–60.\n\n\nAltayeb, Moez, Marco Zennaro, and Marcelo Rovai. 2022.\n“Classifying Mosquito Wingbeat Sound Using\nTinyML.” In Proceedings of the 2022 ACM\nConference on Information Technology for Social Good, 132–37. ACM.\nhttps://doi.org/10.1145/3524458.3547258.\n\n\nAmiel, Frederic, Christophe Clavier, and Michael Tunstall. 2006.\n“Fault Analysis of DPA-Resistant Algorithms.”\nIn International Workshop on Fault Diagnosis and Tolerance in\nCryptography, 223–36. Springer.\n\n\nAnsel, Jason, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain,\nMichael Voznesensky, Bin Bao, et al. 2024. “PyTorch\n2: Faster Machine Learning Through Dynamic Python Bytecode\nTransformation and Graph Compilation.” In Proceedings of the\n29th ACM International Conference on Architectural Support for\nProgramming Languages and Operating Systems, Volume 2, edited by\nHanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence\nd’Alché-Buc, Emily B. Fox, and Roman Garnett, 8024–35. ACM. https://doi.org/10.1145/3620665.3640366.\n\n\nAnthony, Lasse F. Wolff, Benjamin Kanding, and Raghavendra Selvan. 2020.\nICML Workshop on Challenges in Deploying and monitoring Machine Learning\nSystems.\n\n\nAntol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv\nBatra, C. Lawrence Zitnick, and Devi Parikh. 2015.\n“VQA: Visual Question Answering.”\nIn 2015 IEEE International Conference on Computer Vision\n(ICCV), 2425–33. IEEE. https://doi.org/10.1109/iccv.2015.279.\n\n\nAntonakakis, Manos, Tim April, Michael Bailey, Matt Bernhard, Elie\nBursztein, Jaime Cochran, Zakir Durumeric, et al. 2017.\n“Understanding the Mirai Botnet.” In 26th USENIX\nSecurity Symposium (USENIX Security 17), 1093–1110.\n\n\nArdila, Rosana, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer,\nMichael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and\nGregor Weber. 2020. “Common Voice: A\nMassively-Multilingual Speech Corpus.” In Proceedings of the\nTwelfth Language Resources and Evaluation Conference, 4218–22.\nMarseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.520.\n\n\nArifeen, Tooba, Abdus Sami Hassan, and Jeong-A Lee. 2020.\n“Approximate Triple Modular Redundancy: A\nSurvey.” #IEEE_O_ACC# 8: 139851–67. https://doi.org/10.1109/access.2020.3012673.\n\n\nAsonov, D., and R. Agrawal. 2004. “Keyboard Acoustic\nEmanations.” In IEEE Symposium on Security and Privacy, 2004.\nProceedings. 2004, 3–11. IEEE; IEEE. https://doi.org/10.1109/secpri.2004.1301311.\n\n\nAteniese, Giuseppe, Luigi V. Mancini, Angelo Spognardi, Antonio Villani,\nDomenico Vitali, and Giovanni Felici. 2015. “Hacking Smart\nMachines with Smarter Ones: How to Extract Meaningful Data\nfrom Machine Learning Classifiers.” Int. J. Secur. Netw.\n10 (3): 137. https://doi.org/10.1504/ijsn.2015.071829.\n\n\nAttia, Zachi I., Alan Sugrue, Samuel J. Asirvatham, Michael J. Ackerman,\nSuraj Kapa, Paul A. Friedman, and Peter A. Noseworthy. 2018.\n“Noninvasive Assessment of Dofetilide Plasma Concentration Using a\nDeep Learning (Neural Network) Analysis of the Surface\nElectrocardiogram: A Proof of Concept Study.”\nPLoS One 13 (8): e0201059. https://doi.org/10.1371/journal.pone.0201059.\n\n\nBai, Tao, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021.\n“Recent Advances in Adversarial Training for Adversarial\nRobustness.” arXiv Preprint arXiv:2102.01356.\n\n\nBains, Sunny. 2020. “The Business of Building Brains.”\nNature Electronics 3 (7): 348–51. https://doi.org/10.1038/s41928-020-0449-1.\n\n\nBamoumen, Hatim, Anas Temouden, Nabil Benamar, and Yousra Chtouki. 2022.\n“How TinyML Can Be Leveraged to Solve Environmental\nProblems: A Survey.” In 2022 International\nConference on Innovation and Intelligence for Informatics, Computing,\nand Technologies (3ICT), 338–43. IEEE; IEEE. https://doi.org/10.1109/3ict56508.2022.9990661.\n\n\nBank, Dor, Noam Koenigstein, and Raja Giryes. 2023.\n“Autoencoders.” Machine Learning for Data Science\nHandbook: Data Mining and Knowledge Discovery Handbook, 353–74.\n\n\nBannon, Pete, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes.\n2019. “Computer and Redundancy Solution for the Full Self-Driving\nComputer.” In 2019 IEEE Hot Chips 31 Symposium (HCS),\n1–22. IEEE Computer Society; IEEE. https://doi.org/10.1109/hotchips.2019.8875645.\n\n\nBarenghi, Alessandro, Guido M. Bertoni, Luca Breveglieri, Mauro\nPellicioli, and Gerardo Pelosi. 2010. “Low Voltage Fault Attacks\nto AES.” In 2010 IEEE International Symposium on\nHardware-Oriented Security and Trust (HOST), 7–12. IEEE; IEEE. https://doi.org/10.1109/hst.2010.5513121.\n\n\nBarroso, Luiz André, Urs Hölzle, and Parthasarathy Ranganathan. 2019.\nThe Datacenter as a Computer: Designing Warehouse-Scale\nMachines. Springer International Publishing. https://doi.org/10.1007/978-3-031-01761-2.\n\n\nBau, David, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba.\n2017. “Network Dissection: Quantifying\nInterpretability of Deep Visual Representations.” In 2017\nIEEE Conference on Computer Vision and Pattern Recognition (CVPR),\n3319–27. IEEE. https://doi.org/10.1109/cvpr.2017.354.\n\n\nBeaton, Albert E., and John W. Tukey. 1974. “The Fitting of Power\nSeries, Meaning Polynomials, Illustrated on Band-Spectroscopic\nData.” Technometrics 16 (2): 147. https://doi.org/10.2307/1267936.\n\n\nBeck, Nathaniel, and Simon Jackman. 1998. “Beyond Linearity by\nDefault: Generalized Additive Models.” Am. J.\nPolit. Sci. 42 (2): 596. https://doi.org/10.2307/2991772.\n\n\nBender, Emily M., and Batya Friedman. 2018. “Data Statements for\nNatural Language Processing: Toward Mitigating System Bias\nand Enabling Better Science.” Transactions of the Association\nfor Computational Linguistics 6 (December): 587–604. https://doi.org/10.1162/tacl_a_00041.\n\n\nBerger, Vance W, and YanYan Zhou. 2014.\n“Kolmogorovsmirnov Test:\nOverview.” Wiley Statsref: Statistics Reference\nOnline.\n\n\nBeyer, Lucas, Olivier J Hénaff, Alexander Kolesnikov, Xiaohua Zhai, and\nAäron van den Oord. 2020. “Are We Done with Imagenet?”\nArXiv Preprint abs/2006.07159. https://arxiv.org/abs/2006.07159.\n\n\nBhagoji, Arjun Nitin, Warren He, Bo Li, and Dawn Song. 2018.\n“Practical Black-Box Attacks on Deep Neural Networks Using\nEfficient Query Mechanisms.” In Proceedings of the European\nConference on Computer Vision (ECCV), 154–69.\n\n\nBhardwaj, Kshitij, Marton Havasi, Yuan Yao, David M. Brooks, José Miguel\nHernández-Lobato, and Gu-Yeon Wei. 2020. “A Comprehensive\nMethodology to Determine Optimal Coherence Interfaces for\nMany-Accelerator SoCs.” In Proceedings of the\nACM/IEEE International Symposium on Low Power Electronics and\nDesign, 145–50. ACM. https://doi.org/10.1145/3370748.3406564.\n\n\nBiega, Asia J., Peter Potash, Hal Daumé, Fernando Diaz, and Michèle\nFinck. 2020. “Operationalizing the Legal Principle of Data\nMinimization for Personalization.” In Proceedings of the 43rd\nInternational ACM SIGIR Conference on Research and Development in\nInformation Retrieval, edited by Jimmy Huang, Yi Chang, Xueqi\nCheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun Liu, 399–408.\nACM. https://doi.org/10.1145/3397271.3401034.\n\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012.\n“Poisoning Attacks Against Support Vector Machines.” In\nProceedings of the 29th International Conference on Machine\nLearning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1,\n2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\n\n\nBiggs, John, James Myers, Jedrzej Kufel, Emre Ozer, Simon Craske, Antony\nSou, Catherine Ramsdale, Ken Williamson, Richard Price, and Scott White.\n2021. “A Natively Flexible 32-Bit Arm Microprocessor.”\nNature 595 (7868): 532–36. https://doi.org/10.1038/s41586-021-03625-w.\n\n\nBinkert, Nathan, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt,\nAli Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. “The Gem5\nSimulator.” ACM SIGARCH Computer Architecture News 39\n(2): 1–7. https://doi.org/10.1145/2024716.2024718.\n\n\nBohr, Adam, and Kaveh Memarzadeh. 2020. “The Rise of Artificial\nIntelligence in Healthcare Applications.” In Artificial\nIntelligence in Healthcare, 25–60. Elsevier. https://doi.org/10.1016/b978-0-12-818438-7.00002-2.\n\n\nBolchini, Cristiana, Luca Cassano, Antonio Miele, and Alessandro Toschi.\n2023. “Fast and Accurate Error Simulation for CNNs\nAgainst Soft Errors.” IEEE Trans. Comput. 72 (4):\n984–97. https://doi.org/10.1109/tc.2022.3184274.\n\n\nBondi, Elizabeth, Ashish Kapoor, Debadeepta Dey, James Piavis, Shital\nShah, Robert Hannaford, Arvind Iyer, Lucas Joppa, and Milind Tambe.\n2018. “Near Real-Time Detection of Poachers from Drones in\nAirSim.” In Proceedings of the Twenty-Seventh\nInternational Joint Conference on Artificial Intelligence, edited\nby Jérôme Lang, 5814–16. International Joint Conferences on Artificial\nIntelligence Organization. https://doi.org/10.24963/ijcai.2018/847.\n\n\nBourtoule, Lucas, Varun Chandrasekaran, Christopher A. Choquette-Choo,\nHengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas\nPapernot. 2021. “Machine Unlearning.” In 2021 IEEE\nSymposium on Security and Privacy (SP), 141–59. IEEE; IEEE. https://doi.org/10.1109/sp40001.2021.00019.\n\n\nBreier, Jakub, Xiaolu Hou, Dirmanto Jap, Lei Ma, Shivam Bhasin, and Yang\nLiu. 2018. “Deeplaser: Practical Fault Attack on Deep\nNeural Networks.” ArXiv Preprint abs/1806.05859. https://arxiv.org/abs/1806.05859.\n\n\nBrown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan,\nPrafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language\nModels Are Few-Shot Learners.” In Advances in Neural\nInformation Processing Systems 33: Annual Conference on Neural\nInformation Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.\n\n\nBuolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades:\nIntersectional Accuracy Disparities in Commercial Gender\nClassification.” In Conference on Fairness, Accountability\nand Transparency, 77–91. PMLR.\n\n\nBurnet, David, and Richard Thomas. 1989. “Spycatcher:\nThe Commodification of Truth.” J. Law Soc.\n16 (2): 210. https://doi.org/10.2307/1410360.\n\n\nBurr, Geoffrey W., Matthew J. BrightSky, Abu Sebastian, Huai-Yu Cheng,\nJau-Yi Wu, Sangbum Kim, Norma E. Sosa, et al. 2016. “Recent\nProgress in Phase-Change<?Pub _Newline ?>Memory Technology.”\nIEEE Journal on Emerging and Selected Topics in Circuits and\nSystems 6 (2): 146–62. https://doi.org/10.1109/jetcas.2016.2547718.\n\n\nBushnell, Michael L, and Vishwani D Agrawal. 2002. “Built-in\nSelf-Test.” Essentials of Electronic Testing for Digital,\nMemory and Mixed-Signal VLSI Circuits, 489–548.\n\n\nBuyya, Rajkumar, Anton Beloglazov, and Jemal Abawajy. 2010.\n“Energy-Efficient Management of Data Center Resources for Cloud\nComputing: A Vision, Architectural Elements, and Open\nChallenges.” https://arxiv.org/abs/1006.0308.\n\n\nCai, Carrie J., Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel\nSmilkov, Martin Wattenberg, et al. 2019. “Human-Centered Tools for\nCoping with Imperfect Algorithms During Medical Decision-Making.”\nIn Proceedings of the 2019 CHI Conference on Human Factors in\nComputing Systems, edited by Jennifer G. Dy and Andreas Krause,\n80:2673–82. Proceedings of Machine Learning Research. ACM. https://doi.org/10.1145/3290605.3300234.\n\n\nCai, Han, Chuang Gan, Ligeng Zhu, and Song Han. 2020.\n“TinyTL: Reduce Memory, Not Parameters\nfor Efficient on-Device Learning.” In Advances in Neural\nInformation Processing Systems 33: Annual Conference on Neural\nInformation Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/81f7acabd411274fcf65ce2070ed568a-Abstract.html.\n\n\nCai, Han, Ligeng Zhu, and Song Han. 2019.\n“ProxylessNAS: Direct Neural\nArchitecture Search on Target Task and Hardware.” In 7th\nInternational Conference on Learning Representations, ICLR 2019, New\nOrleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HylVB3AqYm.\n\n\nCalvo, Rafael A, Dorian Peters, Karina Vold, and Richard M Ryan. 2020.\n“Supporting Human Autonomy in AI Systems:\nA Framework for Ethical Enquiry.” Ethics of\nDigital Well-Being: A Multidisciplinary Approach, 31–54.\n\n\nCarlini, Nicholas, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah\nSherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. “Hidden\nVoice Commands.” In 25th USENIX Security Symposium (USENIX\nSecurity 16), 513–30.\n\n\nCarta, Salvatore, Alessandro Sebastian Podda, Diego Reforgiato Recupero,\nand Roberto Saia. 2020. “A Local Feature Engineering Strategy to\nImprove Network Anomaly Detection.” Future Internet 12\n(10): 177. https://doi.org/10.3390/fi12100177.\n\n\nCavoukian, Ann. 2009. “Privacy by Design.” Office of\nthe Information and Privacy Commissioner.\n\n\nCenci, Marcelo Pilotto, Tatiana Scarazzato, Daniel Dotto Munchen, Paula\nCristina Dartora, Hugo Marcelo Veit, Andrea Moura Bernardes, and Pablo\nR. Dias. 2021. “Eco-Friendly\nElectronicsA Comprehensive Review.”\nAdv. Mater. Technol. 7 (2): 2001263. https://doi.org/10.1002/admt.202001263.\n\n\nChallenge, WEF Net-Zero. 2021. “The Supply Chain\nOpportunity.” In World Economic Forum: Geneva,\nSwitzerland.\n\n\nChandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly\nDetection: A Survey.” ACM Comput. Surv. 41 (3): 1–58. https://doi.org/10.1145/1541880.1541882.\n\n\nChapelle, O., B. Scholkopf, and A. Zien Eds. 2009.\n“Semi-Supervised Learning (Chapelle, O.\nEt Al., Eds.; 2006) [Book Reviews].” IEEE Trans.\nNeural Networks 20 (3): 542–42. https://doi.org/10.1109/tnn.2009.2015974.\n\n\nChen, Chaofan, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and\nJonathan Su. 2019. “This Looks Like That: Deep\nLearning for Interpretable Image Recognition.” In Advances in\nNeural Information Processing Systems 32: Annual Conference on Neural\nInformation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019,\nVancouver, BC, Canada, edited by Hanna M. Wallach, Hugo Larochelle,\nAlina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman\nGarnett, 8928–39. https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html.\n\n\nChen, Emma, Shvetank Prakash, Vijay Janapa Reddi, David Kim, and Pranav\nRajpurkar. 2023. “A Framework for Integrating Artificial\nIntelligence for Clinical Care with Continuous Therapeutic\nMonitoring.” Nat. Biomed. Eng., November. https://doi.org/10.1038/s41551-023-01115-0.\n\n\nChen, H.-W. 2006. “Gallium, Indium, and Arsenic Pollution of\nGroundwater from a Semiconductor Manufacturing Area of\nTaiwan.” B. Environ. Contam. Tox. 77 (2):\n289–96. https://doi.org/10.1007/s00128-006-1062-3.\n\n\nChen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan,\nHaichen Shen, Meghan Cowan, et al. 2018. “TVM:\nAn Automated End-to-End Optimizing Compiler for Deep\nLearning.” In 13th USENIX Symposium on Operating Systems\nDesign and Implementation (OSDI 18), 578–94.\n\n\nChen, Tianqi, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016.\n“Training Deep Nets with Sublinear Memory Cost.” ArXiv\nPreprint abs/1604.06174. https://arxiv.org/abs/1604.06174.\n\n\nChen, Zhiyong, and Shugong Xu. 2023. “Learning\nDomain-Heterogeneous Speaker Recognition Systems with Personalized\nContinual Federated Learning.” EURASIP Journal on Audio,\nSpeech, and Music Processing 2023 (1): 33. https://doi.org/10.1186/s13636-023-00299-2.\n\n\nChen, Zitao, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben.\n2019. “<i>BinFI</i>: An Efficient Fault Injector for\nSafety-Critical Machine Learning Systems.” In Proceedings of\nthe International Conference for High Performance Computing, Networking,\nStorage and Analysis. SC ’19. New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356177.\n\n\nChen, Zitao, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik\nPattabiraman, and Nathan DeBardeleben. 2020.\n“TensorFI: A Flexible Fault Injection\nFramework for TensorFlow Applications.” In 2020\nIEEE 31st International Symposium on Software Reliability Engineering\n(ISSRE), 426–35. IEEE; IEEE. https://doi.org/10.1109/issre5003.2020.00047.\n\n\nCheng, Eric, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher,\nHyungmin Cho, Kevin Skadron, Mircea R. Stan, et al. 2016. “Clear:\n<U>c</u>\nRoss <u>-l</u>\nAyer <u>e</u>\nXploration for <u>a</u>\nRchitecting <u>r</u>\nEsilience - Combining Hardware and Software Techniques to Tolerate Soft\nErrors in Processor Cores.” In Proceedings of the 53rd Annual\nDesign Automation Conference, 1–6. ACM. https://doi.org/10.1145/2897937.2897996.\n\n\nCheng, Yu, Duo Wang, Pan Zhou, and Tao Zhang. 2018. “Model\nCompression and Acceleration for Deep Neural Networks: The\nPrinciples, Progress, and Challenges.” IEEE Signal Process\nMag. 35 (1): 126–36. https://doi.org/10.1109/msp.2017.2765695.\n\n\nChi, Ping, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu,\nYu Wang, and Yuan Xie. 2016. “Prime: A Novel Processing-in-Memory\nArchitecture for Neural Network Computation in ReRAM-Based Main\nMemory.” ACM SIGARCH Computer Architecture News 44 (3):\n27–39. https://doi.org/10.1145/3007787.3001140.\n\n\nChollet, François. 2018. “Introduction to Keras.” March\n9th.\n\n\nChristiano, Paul F., Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg,\nand Dario Amodei. 2017. “Deep Reinforcement Learning from Human\nPreferences.” In Advances in Neural Information Processing\nSystems 30: Annual Conference on Neural Information Processing Systems\n2017, December 4-9, 2017, Long Beach, CA, USA, edited by Isabelle\nGuyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S.\nV. N. Vishwanathan, and Roman Garnett, 4299–4307. https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html.\n\n\nChu, Grace, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton,\nPieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, and Andrew\nHoward. 2021. “Discovering Multi-Hardware Mobile Models via\nArchitecture Search.” In 2021 IEEE/CVF Conference on Computer\nVision and Pattern Recognition Workshops (CVPRW), 3022–31. IEEE. https://doi.org/10.1109/cvprw53098.2021.00337.\n\n\nChua, L. 1971. “Memristor-the Missing Circuit Element.”\n#IEEE_J_CT# 18 (5): 507–19. https://doi.org/10.1109/tct.1971.1083337.\n\n\nChung, Jae-Won, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and\nMosharaf Chowdhury. 2023. “Perseus: Removing Energy\nBloat from Large Model Training.” ArXiv Preprint\nabs/2312.06902. https://arxiv.org/abs/2312.06902.\n\n\nCohen, Maxime C., Ruben Lobel, and Georgia Perakis. 2016. “The\nImpact of Demand Uncertainty on Consumer Subsidies for Green Technology\nAdoption.” Manage. Sci. 62 (5): 1235–58. https://doi.org/10.1287/mnsc.2015.2173.\n\n\nColeman, Cody, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter\nBailis, Alexander C. Berg, Robert D. Nowak, Roshan Sumbaly, Matei\nZaharia, and I. Zeki Yalniz. 2022. “Similarity Search for\nEfficient Active Learning and Search of Rare Concepts.” In\nThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022,\nThirty-Fourth Conference on Innovative Applications of Artificial\nIntelligence, IAAI 2022, the Twelveth Symposium on Educational Advances\nin Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March\n1, 2022, 6402–10. AAAI Press. https://ojs.aaai.org/index.php/AAAI/article/view/20591.\n\n\nColeman, Cody, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao,\nJian Zhang, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia.\n2019. “Analysis of DAWNBench, a Time-to-Accuracy\nMachine Learning Performance Benchmark.” ACM SIGOPS Operating\nSystems Review 53 (1): 14–25. https://doi.org/10.1145/3352020.3352024.\n\n\nConstantinescu, Cristian. 2008. “Intermittent Faults and Effects\non Reliability of Integrated Circuits.” In 2008 Annual\nReliability and Maintainability Symposium, 370–74. IEEE; IEEE. https://doi.org/10.1109/rams.2008.4925824.\n\n\nCooper, Tom, Suzanne Fallender, Joyann Pafumi, Jon Dettling, Sebastien\nHumbert, and Lindsay Lessard. 2011. “A Semiconductor Company’s\nExamination of Its Water Footprint Approach.” In Proceedings\nof the 2011 IEEE International Symposium on Sustainable Systems and\nTechnology, 1–6. IEEE; IEEE. https://doi.org/10.1109/issst.2011.5936865.\n\n\nCope, Gord. 2009. “Pure Water, Semiconductors and the\nRecession.” Global Water Intelligence 10 (10).\n\n\nD’ignazio, Catherine, and Lauren F Klein. 2023. Data Feminism.\nMIT press.\n\n\nDahl, George E, Frank Schneider, Zachary Nado, Naman Agarwal,\nChandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, et al.\n2021. “CSF Findings in Acute NMDAR and\nLGI1 AntibodyAssociated\nAutoimmune Encephalitis.” Neurology Neuroimmunology &Amp;\nNeuroinflammation 8 (6). https://doi.org/10.1212/nxi.0000000000001086.\n\n\nDarvish Rouhani, Bita, Azalia Mirhoseini, and Farinaz Koushanfar. 2017.\n“TinyDL: Just-in-time\nDeep Learning Solution for Constrained Embedded Systems.” In\n2017 IEEE International Symposium on Circuits and Systems\n(ISCAS), 1–4. IEEE. https://doi.org/10.1109/iscas.2017.8050343.\n\n\nDavarzani, Samaneh, David Saucier, Purva Talegaonkar, Erin Parker, Alana\nTurner, Carver Middleton, Will Carroll, et al. 2023. “Closing the\nWearable Gap: Footankle\nKinematic Modeling via Deep Learning Models Based on a Smart Sock\nWearable.” Wearable Technologies 4. https://doi.org/10.1017/wtc.2023.3.\n\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat\nJeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite\nMicro: Embedded Machine Learning for Tinyml\nSystems.” Proceedings of Machine Learning and Systems 3:\n800–811.\n\n\nDavies, Emma. 2011. “Endangered Elements: Critical\nThinking.” https://www.rsc.org/images/Endangered\\%20Elements\\%20-\\%20Critical\\%20Thinking\\_tcm18-196054.pdf.\n\n\nDavies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya,\nYongqiang Cao, Sri Harsha Choday, Georgios Dimou, et al. 2018.\n“Loihi: A Neuromorphic Manycore Processor with\non-Chip Learning.” IEEE Micro 38 (1): 82–99. https://doi.org/10.1109/mm.2018.112130359.\n\n\nDavies, Mike, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya,\nGabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R.\nRisbud. 2021. “Advancing Neuromorphic Computing with Loihi:\nA Survey of Results and Outlook.” Proc.\nIEEE 109 (5): 911–34. https://doi.org/10.1109/jproc.2021.3067593.\n\n\nDavis, Jacqueline, Daniel Bizo, Andy Lawrence, Owen Rogers, and Max\nSmolaks. 2022. “Uptime Institute Global Data Center Survey\n2022.” Uptime Institute.\n\n\nDayarathna, Miyuru, Yonggang Wen, and Rui Fan. 2016. “Data Center\nEnergy Consumption Modeling: A Survey.” IEEE\nCommunications Surveys &Amp; Tutorials 18 (1): 732–94. https://doi.org/10.1109/comst.2015.2481183.\n\n\nDean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc\nV. Le, Mark Z. Mao, et al. 2012. “Large Scale Distributed Deep\nNetworks.” In Advances in Neural Information Processing\nSystems 25: 26th Annual Conference on Neural Information Processing\nSystems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake\nTahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando\nC. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q.\nWeinberger, 1232–40. https://proceedings.neurips.cc/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html.\n\n\nDeng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li.\n2009. “ImageNet: A Large-Scale\nHierarchical Image Database.” In 2009 IEEE Conference on\nComputer Vision and Pattern Recognition, 248–55. IEEE. https://doi.org/10.1109/cvpr.2009.5206848.\n\n\nDesai, Tanvi, Felix Ritchie, Richard Welpton, et al. 2016. “Five\nSafes: Designing Data Access for Research.”\nEconomics Working Paper Series 1601: 28.\n\n\nDevlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.\n“BERT: Pre-training of\nDeep Bidirectional Transformers for Language Understanding.” In\nProceedings of the 2019 Conference of the North, 4171–86.\nMinneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423.\n\n\nDhar, Sauptik, Junyao Guo, Jiayi (Jason) Liu, Samarth Tripathi, Unmesh\nKurup, and Mohak Shah. 2021. “A Survey of on-Device Machine\nLearning: An Algorithms and Learning Theory Perspective.” ACM\nTransactions on Internet of Things 2 (3): 1–49. https://doi.org/10.1145/3450494.\n\n\nDong, Xin, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T.\nKung, and Ziyun Li. 2022. “SplitNets:\nDesigning Neural Architectures for Efficient Distributed\nComputing on Head-Mounted Systems.” In 2022 IEEE/CVF\nConference on Computer Vision and Pattern Recognition (CVPR),\n12549–59. IEEE. https://doi.org/10.1109/cvpr52688.2022.01223.\n\n\nDongarra, Jack J. 2009. “The Evolution of High Performance\nComputing on System z.” IBM J. Res. Dev. 53: 3–4.\n\n\nDuarte, Javier, Nhan Tran, Ben Hawks, Christian Herwig, Jules Muhizi,\nShvetank Prakash, and Vijay Janapa Reddi. 2022.\n“FastML Science Benchmarks: Accelerating\nReal-Time Scientific Edge Machine Learning.” ArXiv\nPreprint abs/2207.07958. https://arxiv.org/abs/2207.07958.\n\n\nDuchi, John C., Elad Hazan, and Yoram Singer. 2010. “Adaptive\nSubgradient Methods for Online Learning and Stochastic\nOptimization.” In COLT 2010 - the 23rd Conference on Learning\nTheory, Haifa, Israel, June 27-29, 2010, edited by Adam Tauman\nKalai and Mehryar Mohri, 257–69. Omnipress. http://colt2010.haifa.il.ibm.com/papers/COLT2010proceedings.pdf\\#page=265.\n\n\nDuisterhof, Bardienus P, Srivatsan Krishnan, Jonathan J Cruz, Colby R\nBanbury, William Fu, Aleksandra Faust, Guido CHE de Croon, and Vijay\nJanapa Reddi. 2019. “Learning to Seek: Autonomous\nSource Seeking with Deep Reinforcement Learning Onboard a Nano Drone\nMicrocontroller.” ArXiv Preprint abs/1909.11236. https://arxiv.org/abs/1909.11236.\n\n\nDuisterhof, Bardienus P., Shushuai Li, Javier Burgues, Vijay Janapa\nReddi, and Guido C. H. E. de Croon. 2021. “Sniffy Bug:\nA Fully Autonomous Swarm of Gas-Seeking Nano Quadcopters in\nCluttered Environments.” In 2021 IEEE/RSJ International\nConference on Intelligent Robots and Systems (IROS), 9099–9106.\nIEEE; IEEE. https://doi.org/10.1109/iros51168.2021.9636217.\n\n\nDwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006.\n“Calibrating Noise to Sensitivity in Private Data\nAnalysis.” In Theory of Cryptography, edited by Shai\nHalevi and Tal Rabin, 265–84. Berlin, Heidelberg: Springer Berlin\nHeidelberg.\n\n\nDwork, Cynthia, and Aaron Roth. 2013. “The Algorithmic Foundations\nof Differential Privacy.” Foundations and Trends\nin Theoretical Computer Science 9 (3-4): 211–407. https://doi.org/10.1561/0400000042.\n\n\nEbrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. “A\nReview of Data Center Cooling Technology, Operating Conditions and the\nCorresponding Low-Grade Waste Heat Recovery Opportunities.”\nRenewable Sustainable Energy Rev. 31 (March): 622–38. https://doi.org/10.1016/j.rser.2013.12.007.\n\n\nEgwutuoha, Ifeanyi P., David Levy, Bran Selic, and Shiping Chen. 2013.\n“A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart\nImplementations for High Performance Computing Systems.” The\nJournal of Supercomputing 65 (3): 1302–26. https://doi.org/10.1007/s11227-013-0884-0.\n\n\nEisenman, Assaf, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere,\nRaghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and\nMurali Annavaram. 2022. “Check-n-Run: A Checkpointing\nSystem for Training Deep Learning Recommendation Models.” In\n19th USENIX Symposium on Networked Systems Design and Implementation\n(NSDI 22), 929–43.\n\n\nEldan, Ronen, and Mark Russinovich. 2023. “Who’s Harry Potter?\nApproximate Unlearning in LLMs.” ArXiv\nPreprint abs/2310.02238. https://arxiv.org/abs/2310.02238.\n\n\nEl-Rayis, A. O. 2014. “Reconfigurable Architectures for the Next\nGeneration of Mobile Device Telecommunications Systems.” :\nhttps://www.researchgate.net/publication/292608967.\n\n\nEshraghian, Jason K., Max Ward, Emre O. Neftci, Xinxin Wang, Gregor\nLenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D. Lu.\n2023. “Training Spiking Neural Networks Using Lessons from Deep\nLearning.” Proc. IEEE 111 (9): 1016–54. https://doi.org/10.1109/jproc.2023.3308088.\n\n\nEsteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M.\nSwetter, Helen M. Blau, and Sebastian Thrun. 2017.\n“Dermatologist-Level Classification of Skin Cancer with Deep\nNeural Networks.” Nature 542 (7639): 115–18. https://doi.org/10.1038/nature21056.\n\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati,\nChaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017.\n“Robust Physical-World Attacks on Deep Learning Models.”\nArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\n\n\nFahim, Farah, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo\nJindariani, Nhan Tran, Luca P. Carloni, et al. 2021. “Hls4ml:\nAn Open-Source Codesign Workflow to Empower Scientific\nLow-Power Machine Learning Devices.” https://arxiv.org/abs/2103.05579.\n\n\nFarah, Martha J. 2005. “Neuroethics: The Practical\nand the Philosophical.” Trends Cogn. Sci. 9 (1): 34–40.\nhttps://doi.org/10.1016/j.tics.2004.12.001.\n\n\nFarwell, James P., and Rafal Rohozinski. 2011. “Stuxnet and the\nFuture of Cyber War.” Survival 53 (1): 23–40. https://doi.org/10.1080/00396338.2011.555586.\n\n\nFowers, Jeremy, Kalin Ovtcharov, Michael Papamichael, Todd Massengill,\nMing Liu, Daniel Lo, Shlomi Alkalay, et al. 2018. “A Configurable\nCloud-Scale DNN Processor for Real-Time\nAI.” In 2018 ACM/IEEE 45th Annual International\nSymposium on Computer Architecture (ISCA), 1–14. IEEE; IEEE. https://doi.org/10.1109/isca.2018.00012.\n\n\nFrancalanza, Adrian, Luca Aceto, Antonis Achilleos, Duncan Paul Attard,\nIan Cassar, Dario Della Monica, and Anna Ingólfsdóttir. 2017. “A\nFoundation for Runtime Monitoring.” In International\nConference on Runtime Verification, 8–29. Springer.\n\n\nFrankle, Jonathan, and Michael Carbin. 2019. “The Lottery Ticket\nHypothesis: Finding Sparse, Trainable Neural\nNetworks.” In 7th International Conference on Learning\nRepresentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.\nOpenReview.net. https://openreview.net/forum?id=rJl-b3RcF7.\n\n\nFriedman, Batya. 1996. “Value-Sensitive Design.”\nInteractions 3 (6): 16–23. https://doi.org/10.1145/242485.242493.\n\n\nFurber, Steve. 2016. “Large-Scale Neuromorphic Computing\nSystems.” J. Neural Eng. 13 (5): 051001. https://doi.org/10.1088/1741-2560/13/5/051001.\n\n\nFursov, Ivan, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun,\nRodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey\nZaytsev, and Evgeny Burnaev. 2021. “Adversarial Attacks on Deep\nModels for Financial Transaction Records.” In Proceedings of\nthe 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data\nMining, 2868–78. ACM. https://doi.org/10.1145/3447548.3467145.\n\n\nGale, Trevor, Erich Elsen, and Sara Hooker. 2019. “The State of\nSparsity in Deep Neural Networks.” ArXiv Preprint\nabs/1902.09574. https://arxiv.org/abs/1902.09574.\n\n\nGandolfi, Karine, Christophe Mourtel, and Francis Olivier. 2001.\n“Electromagnetic Analysis: Concrete Results.”\nIn Cryptographic Hardware and Embedded SystemsCHES\n2001: Third International Workshop Paris, France, May 1416,\n2001 Proceedings 3, 251–61. Springer.\n\n\nGannot, G., and M. Ligthart. 1994. “Verilog HDL Based\nFPGA Design.” In International Verilog HDL\nConference, 86–92. IEEE. https://doi.org/10.1109/ivc.1994.323743.\n\n\nGao, Yansong, Said F. Al-Sarawi, and Derek Abbott. 2020. “Physical\nUnclonable Functions.” Nature Electronics 3 (2): 81–91.\nhttps://doi.org/10.1038/s41928-020-0372-5.\n\n\nGates, Byron D. 2009. “Flexible Electronics.”\nScience 323 (5921): 1566–67. https://doi.org/10.1126/science.1171230.\n\n\nGebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman\nVaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021.\n“Datasheets for Datasets.” Commun. ACM 64 (12):\n86–92. https://doi.org/10.1145/3458723.\n\n\nGeiger, Atticus, Hanson Lu, Thomas Icard, and Christopher Potts. 2021.\n“Causal Abstractions of Neural Networks.” In Advances\nin Neural Information Processing Systems 34: Annual Conference on Neural\nInformation Processing Systems 2021, NeurIPS 2021, December 6-14, 2021,\nVirtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N.\nDauphin, Percy Liang, and Jennifer Wortman Vaughan, 9574–86. https://proceedings.neurips.cc/paper/2021/hash/4f5c422f4d49a5a807eda27434231040-Abstract.html.\n\n\nGholami, Dong Kim, Mahoney Yao, and Keutzer. 2021. “A Survey of\nQuantization Methods for Efficient Neural Network Inference).”\nArXiv Preprint. https://arxiv.org/abs/2103.13630.\n\n\nGlorot, Xavier, and Yoshua Bengio. 2010. “Understanding the\nDifficulty of Training Deep Feedforward Neural Networks.” In\nProceedings of the Thirteenth International Conference on Artificial\nIntelligence and Statistics. https://proceedings.mlr.press/v9/glorot10a.html.\n\n\nGnad, Dennis R. E., Fabian Oboril, and Mehdi B. Tahoori. 2017.\n“Voltage Drop-Based Fault Attacks on FPGAs Using\nValid Bitstreams.” In 2017 27th International Conference on\nField Programmable Logic and Applications (FPL), 1–7. IEEE; IEEE.\nhttps://doi.org/10.23919/fpl.2017.8056840.\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David\nWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020.\n“Generative Adversarial Networks.” Commun. ACM 63\n(11): 139–44. https://doi.org/10.1145/3422622.\n\n\nGoodyear, Victoria A. 2017. “Social Media, Apps and Wearable\nTechnologies: Navigating Ethical Dilemmas and\nProcedures.” Qualitative Research in Sport, Exercise and\nHealth 9 (3): 285–302. https://doi.org/10.1080/2159676x.2017.1303790.\n\n\nGoogle. n.d. “Information Quality Content Moderation.” https://blog.google/documents/83/.\n\n\nGordon, Ariel, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang,\nand Edward Choi. 2018. “MorphNet: Fast\n&Amp; Simple Resource-Constrained Structure Learning of Deep\nNetworks.” In 2018 IEEE/CVF Conference on Computer Vision and\nPattern Recognition, 1586–95. IEEE. https://doi.org/10.1109/cvpr.2018.00171.\n\n\nGräfe, Ralf, Qutub Syed Sha, Florian Geissler, and Michael Paulitsch.\n2023. “Large-Scale Application of Fault Injection into\nPyTorch Models -an Extension to PyTorchFI for\nValidation Efficiency.” In 2023 53rd Annual IEEE/IFIP\nInternational Conference on Dependable Systems and Networks -\nSupplemental Volume (DSN-s), 56–62. IEEE; IEEE. https://doi.org/10.1109/dsn-s58398.2023.00025.\n\n\nGreengard, Samuel. 2015. The Internet of Things. The MIT Press.\nhttps://doi.org/10.7551/mitpress/10277.001.0001.\n\n\nGrossman, Elizabeth. 2007. High Tech Trash: Digital\nDevices, Hidden Toxics, and Human Health. Island press.\n\n\nGruslys, Audrunas, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex\nGraves. 2016. “Memory-Efficient Backpropagation Through\nTime.” In Advances in Neural Information Processing Systems\n29: Annual Conference on Neural Information Processing Systems 2016,\nDecember 5-10, 2016, Barcelona, Spain, edited by Daniel D. Lee,\nMasashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett,\n4125–33. https://proceedings.neurips.cc/paper/2016/hash/a501bebf79d570651ff601788ea9d16d-Abstract.html.\n\n\nGu, Ivy. 2023. “Deep Learning Model Compression (Ii) by Ivy Gu\nMedium.” https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453.\n\n\nGuo, Chuan, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian\nWeinberger. 2019. “Simple Black-Box Adversarial Attacks.”\nIn International Conference on Machine Learning, 2484–93. PMLR.\n\n\nGuo, Yutao, Hao Wang, Hui Zhang, Tong Liu, Zhaoguang Liang, Yunlong Xia,\nLi Yan, et al. 2019. “Mobile Photoplethysmographic Technology to\nDetect Atrial Fibrillation.” J. Am. Coll. Cardiol. 74\n(19): 2365–75. https://doi.org/10.1016/j.jacc.2019.08.019.\n\n\nGupta, Maanak, Charankumar Akiri, Kshitiz Aryal, Eli Parker, and\nLopamudra Praharaj. 2023. “From ChatGPT to\nThreatGPT: Impact of Generative\nAI in Cybersecurity and Privacy.”\n#IEEE_O_ACC# 11: 80218–45. https://doi.org/10.1109/access.2023.3300381.\n\n\nGupta, Maya, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin\nCanini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van\nEsbroeck. 2016. “Monotonic Calibrated Interpolated Look-up\nTables.” The Journal of Machine Learning Research 17\n(1): 3790–3836.\n\n\nGupta, Udit, Mariam Elgamal, Gage Hills, Gu-Yeon Wei, Hsien-Hsin S. Lee,\nDavid Brooks, and Carole-Jean Wu. 2022. “Act: Designing\nSustainable Computer Systems with an Architectural Carbon Modeling\nTool.” In Proceedings of the 49th Annual International\nSymposium on Computer Architecture, 784–99. ACM. https://doi.org/10.1145/3470496.3527408.\n\n\nGwennap, Linley. n.d. “Certus-NX Innovates\nGeneral-Purpose FPGAs.”\n\n\nHaensch, Wilfried, Tayfun Gokmen, and Ruchir Puri. 2019. “The Next\nGeneration of Deep Learning Hardware: Analog\nComputing.” Proc. IEEE 107 (1): 108–22. https://doi.org/10.1109/jproc.2018.2871057.\n\n\nHamming, R. W. 1950. “Error Detecting and Error Correcting\nCodes.” Bell Syst. Tech. J. 29 (2): 147–60. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.\n\n\nHan, Song, Huizi Mao, and William J Dally. 2015. “Deep\nCompression: Compressing Deep Neural Networks with Pruning, Trained\nQuantization and Huffman Coding.” arXiv Preprint\narXiv:1510.00149.\n\n\nHan, Song, Huizi Mao, and William J. Dally. 2016. “Deep\nCompression: Compressing Deep Neural Networks with Pruning,\nTrained Quantization and Huffman Coding.” https://arxiv.org/abs/1510.00149.\n\n\nHandlin, Oscar. 1965. “Science and Technology in Popular\nCulture.” Daedalus-Us., 156–70.\n\n\nHardt, Moritz, Eric Price, and Nati Srebro. 2016. “Equality of\nOpportunity in Supervised Learning.” In Advances in Neural\nInformation Processing Systems 29: Annual Conference on Neural\nInformation Processing Systems 2016, December 5-10, 2016, Barcelona,\nSpain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von\nLuxburg, Isabelle Guyon, and Roman Garnett, 3315–23. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html.\n\n\nHawks, Benjamin, Javier Duarte, Nicholas J. Fraser, Alessandro\nPappalardo, Nhan Tran, and Yaman Umuroglu. 2021. “Ps and Qs: Quantization-aware Pruning for Efficient Low\nLatency Neural Network Inference.” Frontiers in Artificial\nIntelligence 4 (July). https://doi.org/10.3389/frai.2021.676564.\n\n\nHazan, Avi, and Elishai Ezra Tsur. 2021. “Neuromorphic Analog\nImplementation of Neural Engineering Framework-Inspired Spiking Neuron\nfor High-Dimensional Representation.” Front. Neurosci.\n15 (February): 627221. https://doi.org/10.3389/fnins.2021.627221.\n\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015.\n“Delving Deep into Rectifiers: Surpassing Human-Level\nPerformance on ImageNet Classification.” In 2015\nIEEE International Conference on Computer Vision (ICCV), 1026–34.\nIEEE. https://doi.org/10.1109/iccv.2015.123.\n\n\n———. 2016. “Deep Residual Learning for Image Recognition.”\nIn 2016 IEEE Conference on Computer Vision and Pattern Recognition\n(CVPR), 770–78. IEEE. https://doi.org/10.1109/cvpr.2016.90.\n\n\nHe, Yi, Prasanna Balaprakash, and Yanjing Li. 2020.\n“FIdelity: Efficient Resilience Analysis\nFramework for Deep Learning Accelerators.” In 2020 53rd\nAnnual IEEE/ACM International Symposium on Microarchitecture\n(MICRO), 270–81. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00033.\n\n\nHe, Yi, Mike Hutton, Steven Chan, Robert De Gruijl, Rama Govindaraju,\nNishant Patil, and Yanjing Li. 2023. “Understanding and Mitigating\nHardware Failures in Deep Learning Training Systems.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture, 1–16. IEEE; ACM. https://doi.org/10.1145/3579371.3589105.\n\n\nHébert-Johnson, Úrsula, Michael P. Kim, Omer Reingold, and Guy N.\nRothblum. 2018. “Multicalibration: Calibration for\nthe (Computationally-Identifiable) Masses.” In\nProceedings of the 35th International Conference on Machine\nLearning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15,\n2018, edited by Jennifer G. Dy and Andreas Krause, 80:1944–53.\nProceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/hebert-johnson18a.html.\n\n\nHegde, Sumant. 2023. “An Introduction to Separable Convolutions -\nAnalytics Vidhya.” https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/.\n\n\nHenderson, Peter, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky,\nand Joelle Pineau. 2020. “Towards the Systematic Reporting of the\nEnergy and Carbon Footprints of Machine Learning.” The\nJournal of Machine Learning Research 21 (1): 10039–81.\n\n\nHendrycks, Dan, and Thomas Dietterich. 2019. “Benchmarking Neural\nNetwork Robustness to Common Corruptions and Perturbations.”\narXiv Preprint arXiv:1903.12261.\n\n\nHendrycks, Dan, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn\nSong. 2021. “Natural Adversarial Examples.” In 2021\nIEEE/CVF Conference on Computer Vision and Pattern Recognition\n(CVPR), 15262–71. IEEE. https://doi.org/10.1109/cvpr46437.2021.01501.\n\n\nHennessy, John L., and David A. Patterson. 2019. “A New Golden Age\nfor Computer Architecture.” Commun. ACM 62 (2): 48–60.\nhttps://doi.org/10.1145/3282307.\n\n\nHimmelstein, Gracie, David Bates, and Li Zhou. 2022. “Examination\nof Stigmatizing Language in the Electronic Health Record.”\nJAMA Network Open 5 (1): e2144967. https://doi.org/10.1001/jamanetworkopen.2021.44967.\n\n\nHinton, Geoffrey. 2005. “Van Nostrand’s Scientific Encyclopedia.” Wiley.\nhttps://doi.org/10.1002/0471743984.vse0673.\n\n\n———. 2017. “Overview of Minibatch Gradient Descent.”\nUniversity of Toronto; University Lecture.\n\n\nHo Yoon, Jung, Hyung-Suk Jung, Min Hwan Lee, Gun Hwan Kim, Seul Ji Song,\nJun Yeong Seok, Kyung Jean Yoon, et al. 2012. “Frontiers in\nElectronic Materials.” Wiley. https://doi.org/10.1002/9783527667703.ch67.\n\n\nHoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and\nAlexandra Peste. 2021. “Sparsity in Deep Learning:\nPruning and Growth for Efficient Inference and Training in\nNeural Networks.” https://arxiv.org/abs/2102.00554.\n\n\nHolland, Sarah, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia\nChmielinski. 2020. “The Dataset Nutrition Label: A Framework to\nDrive Higher Data Quality Standards.” In Data Protection and\nPrivacy. Hart Publishing. https://doi.org/10.5040/9781509932771.ch-001.\n\n\nHong, Sanghyun, Nicholas Carlini, and Alexey Kurakin. 2023.\n“Publishing Efficient on-Device Models Increases Adversarial\nVulnerability.” In 2023 IEEE Conference on Secure and\nTrustworthy Machine Learning (SaTML), 271–90. IEEE; IEEE. https://doi.org/10.1109/satml54575.2023.00026.\n\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran.\n2017. “Deceiving Google’s Perspective Api Built for Detecting\nToxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\n\n\nHoward, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun\nWang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017.\n“MobileNets: Efficient Convolutional\nNeural Networks for Mobile Vision Applications.” ArXiv\nPreprint. https://arxiv.org/abs/1704.04861.\n\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman\nMahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay\nJanapa Reddi. 2023. “MAVFI: An\nEnd-to-End Fault Analysis Framework with Anomaly Detection and Recovery\nfor Micro Aerial Vehicles.” In 2023 Design, Automation\n&Amp; Test in Europe Conference &Amp; Exhibition (DATE),\n1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\n\n\nHsu, Liang-Ching, Ching-Yi Huang, Yen-Hsun Chuang, Ho-Wen Chen, Ya-Ting\nChan, Heng Yi Teah, Tsan-Yao Chen, Chiung-Fen Chang, Yu-Ting Liu, and\nYu-Min Tzou. 2016. “Accumulation of Heavy Metals and Trace\nElements in Fluvial Sediments Received Effluents from Traditional and\nSemiconductor Industries.” Scientific Reports 6 (1):\n34250. https://doi.org/10.1038/srep34250.\n\n\nHu, Jie, Li Shen, and Gang Sun. 2018. “Squeeze-and-Excitation\nNetworks.” In 2018 IEEE/CVF Conference on Computer Vision and\nPattern Recognition, 7132–41. IEEE. https://doi.org/10.1109/cvpr.2018.00745.\n\n\nHu, Yang, Jie Jiang, Lifu Zhang, Yunfeng Shi, and Jian Shi. 2023.\n“Halide Perovskite Semiconductors.” Wiley. https://doi.org/10.1002/9783527829026.ch13.\n\n\nHuang, Tsung-Ching, Kenjiro Fukuda, Chun-Ming Lo, Yung-Hui Yeh, Tsuyoshi\nSekitani, Takao Someya, and Kwang-Ting Cheng. 2011.\n“Pseudo-CMOS: A Design Style for\nLow-Cost and Robust Flexible Electronics.” IEEE Trans.\nElectron Devices 58 (1): 141–50. https://doi.org/10.1109/ted.2010.2088127.\n\n\nHutter, Michael, Jorn-Marc Schmidt, and Thomas Plos. 2009.\n“Contact-Based Fault Injections and Power Analysis on\nRFID Tags.” In 2009 European Conference on\nCircuit Theory and Design, 409–12. IEEE; IEEE. https://doi.org/10.1109/ecctd.2009.5275012.\n\n\nIandola, Forrest N, Song Han, Matthew W Moskewicz, Khalid Ashraf,\nWilliam J Dally, and Kurt Keutzer. 2016. “SqueezeNet:\nAlexnet-level Accuracy with 50x Fewer\nParameters and 0.5 MB Model Size.” ArXiv\nPreprint abs/1602.07360. https://arxiv.org/abs/1602.07360.\n\n\nIgnatov, Andrey, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim\nHartley, and Luc Van Gool. 2018. “AI Benchmark:\nRunning Deep Neural Networks on Android\nSmartphones,” 0–0.\n\n\nImani, Mohsen, Abbas Rahimi, and Tajana S. Rosing. 2016.\n“Resistive Configurable Associative Memory for Approximate\nComputing.” In Proceedings of the 2016 Design, Automation\n&Amp; Test in Europe Conference &Amp; Exhibition (DATE),\n1327–32. IEEE; Research Publishing Services. https://doi.org/10.3850/9783981537079_0454.\n\n\nIntelLabs. 2023. “Knowledge Distillation - Neural Network\nDistiller.” https://intellabs.github.io/distiller/knowledge_distillation.html.\n\n\nIppolito, Daphne, Florian Tramer, Milad Nasr, Chiyuan Zhang, Matthew\nJagielski, Katherine Lee, Christopher Choquette Choo, and Nicholas\nCarlini. 2023. “Preventing Generation of Verbatim Memorization in\nLanguage Models Gives a False Sense of Privacy.” In\nProceedings of the 16th International Natural Language Generation\nConference, 5253–70. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.inlg-main.3.\n\n\nIrimia-Vladu, Mihai. 2014.\n““Green” Electronics:\nBiodegradable and Biocompatible Materials and Devices for\nSustainable Future.” Chem. Soc. Rev. 43 (2): 588–610. https://doi.org/10.1039/c3cs60235d.\n\n\nIsscc. 2014. “Computing’s Energy Problem (and What We Can Do about\nIt).” https://ieeexplore.ieee.org/document/6757323.\n\n\nJacob, Benoit, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang,\nAndrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018.\n“Quantization and Training of Neural Networks for Efficient\nInteger-Arithmetic-Only Inference.” In 2018 IEEE/CVF\nConference on Computer Vision and Pattern Recognition, 2704–13.\nIEEE. https://doi.org/10.1109/cvpr.2018.00286.\n\n\nJanapa Reddi, Vijay, Alexander Elium, Shawn Hymel, David Tischler,\nDaniel Situnayake, Carl Ward, Louis Moreau, et al. 2023. “Edge\nImpulse: An MLOps Platform for Tiny Machine\nLearning.” Proceedings of Machine Learning and Systems\n5.\n\n\nJha, A. R. 2014. Rare Earth Materials: Properties and\nApplications. CRC Press. https://doi.org/10.1201/b17045.\n\n\nJha, Saurabh, Subho Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B.\nSullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, and Ravishankar K.\nIyer. 2019. “ML-Based Fault Injection for Autonomous\nVehicles: A Case for Bayesian Fault\nInjection.” In 2019 49th Annual IEEE/IFIP International\nConference on Dependable Systems and Networks (DSN), 112–24. IEEE;\nIEEE. https://doi.org/10.1109/dsn.2019.00025.\n\n\nJia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan\nLong, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014.\n“Caffe: Convolutional Architecture for Fast Feature\nEmbedding.” In Proceedings of the 22nd ACM International\nConference on Multimedia, 675–78. ACM. https://doi.org/10.1145/2647868.2654889.\n\n\nJia, Zhe, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza.\n2018. “Dissecting the NVIDIA Volta\nGPU Architecture via Microbenchmarking.” ArXiv\nPreprint. https://arxiv.org/abs/1804.06826.\n\n\nJia, Zhenge, Dawei Li, Xiaowei Xu, Na Li, Feng Hong, Lichuan Ping, and\nYiyu Shi. 2023. “Life-Threatening Ventricular Arrhythmia Detection\nChallenge in Implantable\nCardioverterdefibrillators.” Nature Machine\nIntelligence 5 (5): 554–55. https://doi.org/10.1038/s42256-023-00659-9.\n\n\nJia, Zhihao, Matei Zaharia, and Alex Aiken. 2019. “Beyond Data and\nModel Parallelism for Deep Neural Networks.” In Proceedings\nof Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA,\nMarch 31 - April 2, 2019, edited by Ameet Talwalkar, Virginia\nSmith, and Matei Zaharia. mlsys.org. https://proceedings.mlsys.org/book/265.pdf.\n\n\nJin, Yilun, Xiguang Wei, Yang Liu, and Qiang Yang. 2020. “Towards\nUtilizing Unlabeled Data in Federated Learning: A Survey\nand Prospective.” arXiv Preprint arXiv:2002.11545.\n\n\nJohnson-Roberson, Matthew, Charles Barto, Rounak Mehta, Sharath Nittur\nSridhar, Karl Rosaen, and Ram Vasudevan. 2017. “Driving in the\nMatrix: Can Virtual Worlds Replace Human-Generated\nAnnotations for Real World Tasks?” In 2017 IEEE International\nConference on Robotics and Automation (ICRA), 746–53. Singapore,\nSingapore: IEEE. https://doi.org/10.1109/icra.2017.7989092.\n\n\nJouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav\nAgrawal, Raminder Bajwa, Sarah Bates, et al. 2017a. “In-Datacenter\nPerformance Analysis of a Tensor Processing Unit.” In\nProceedings of the 44th Annual International Symposium on Computer\nArchitecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\n\n———, et al. 2017b. “In-Datacenter Performance Analysis of a Tensor\nProcessing Unit.” In Proceedings of the 44th Annual\nInternational Symposium on Computer Architecture, 1–12. ISCA ’17.\nNew York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\n\nJouppi, Norm, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng\nNai, Nishant Patil, et al. 2023. “TPU V4:\nAn Optically Reconfigurable Supercomputer for Machine\nLearning with Hardware Support for Embeddings.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture. ISCA ’23. New York, NY, USA: ACM. https://doi.org/10.1145/3579371.3589350.\n\n\nJoye, Marc, and Michael Tunstall. 2012. Fault Analysis in\nCryptography. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-29656-7.\n\n\nKairouz, Peter, Sewoong Oh, and Pramod Viswanath. 2015. “Secure\nMulti-Party Differential Privacy.” In Advances in Neural\nInformation Processing Systems 28: Annual Conference on Neural\nInformation Processing Systems 2015, December 7-12, 2015, Montreal,\nQuebec, Canada, edited by Corinna Cortes, Neil D. Lawrence, Daniel\nD. Lee, Masashi Sugiyama, and Roman Garnett, 2008–16. https://proceedings.neurips.cc/paper/2015/hash/a01610228fe998f515a72dd730294d87-Abstract.html.\n\n\nKalamkar, Dhiraj, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das,\nKunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, et al. 2019.\n“A Study of BFLOAT16 for Deep Learning\nTraining.” https://arxiv.org/abs/1905.12322.\n\n\nKao, Sheng-Chun, Geonhwa Jeong, and Tushar Krishna. 2020.\n“ConfuciuX: Autonomous Hardware Resource\nAssignment for DNN Accelerators Using Reinforcement\nLearning.” In 2020 53rd Annual IEEE/ACM International\nSymposium on Microarchitecture (MICRO), 622–36. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00058.\n\n\nKao, Sheng-Chun, and Tushar Krishna. 2020. “Gamma: Automating the\nHW Mapping of DNN Models on Accelerators via Genetic Algorithm.”\nIn Proceedings of the 39th International Conference on\nComputer-Aided Design, 1–9. ACM. https://doi.org/10.1145/3400302.3415639.\n\n\nKaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin\nChess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario\nAmodei. 2020. “Scaling Laws for Neural Language Models.”\nArXiv Preprint abs/2001.08361. https://arxiv.org/abs/2001.08361.\n\n\nKarargyris, Alexandros, Renato Umeton, Micah J Sheller, Alejandro\nAristizabal, Johnu George, Anna Wuest, Sarthak Pati, et al. 2023.\n“Federated Benchmarking of Medical Artificial Intelligence with\nMedPerf.” Nature Machine Intelligence 5\n(7): 799–810. https://doi.org/10.1038/s42256-023-00652-2.\n\n\nKaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna\nWallach, and Jennifer Wortman Vaughan. 2020. “Interpreting\nInterpretability: Understanding Data Scientists’ Use of\nInterpretability Tools for Machine Learning.” In Proceedings\nof the 2020 CHI Conference on Human Factors in Computing Systems,\nedited by Regina Bernhaupt, Florian ’Floyd’Mueller, David Verweij, Josh\nAndres, Joanna McGrenere, Andy Cockburn, Ignacio Avellino, et al., 1–14.\nACM. https://doi.org/10.1145/3313831.3376219.\n\n\nKawazoe Aguilera, Marcos, Wei Chen, and Sam Toueg. 1997.\n“Heartbeat: A Timeout-Free Failure Detector for\nQuiescent Reliable Communication.” In Distributed Algorithms:\n11th International Workshop, WDAG’97 Saarbrücken, Germany, September\n2426, 1997 Proceedings 11, 126–40. Springer.\n\n\nKhan, Mohammad Emtiyaz, and Siddharth Swaroop. 2021.\n“Knowledge-Adaptation Priors.” In Advances in Neural\nInformation Processing Systems 34: Annual Conference on Neural\nInformation Processing Systems 2021, NeurIPS 2021, December 6-14, 2021,\nVirtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N.\nDauphin, Percy Liang, and Jennifer Wortman Vaughan, 19757–70. https://proceedings.neurips.cc/paper/2021/hash/a4380923dd651c195b1631af7c829187-Abstract.html.\n\n\nKiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger,\nZhengxuan Wu, Bertie Vidgen, et al. 2021. “Dynabench:\nRethinking Benchmarking in NLP.” In\nProceedings of the 2021 Conference of the North American Chapter of\nthe Association for Computational Linguistics: Human Language\nTechnologies, 4110–24. Online: Association for Computational\nLinguistics. https://doi.org/10.18653/v1/2021.naacl-main.324.\n\n\nKim, Jungrae, Michael Sullivan, and Mattan Erez. 2015. “Bamboo\nECC: Strong, Safe, and Flexible Codes for\nReliable Computer Memory.” In 2015 IEEE 21st International\nSymposium on High Performance Computer Architecture (HPCA), 101–12.\nIEEE; IEEE. https://doi.org/10.1109/hpca.2015.7056025.\n\n\nKim, Sunju, Chungsik Yoon, Seunghon Ham, Jihoon Park, Ohun Kwon, Donguk\nPark, Sangjun Choi, Seungwon Kim, Kwonchul Ha, and Won Kim. 2018.\n“Chemical Use in the Semiconductor Manufacturing Industry.”\nInt. J. Occup. Env. Heal. 24 (3-4): 109–18. https://doi.org/10.1080/10773525.2018.1519957.\n\n\nKingma, Diederik P., and Jimmy Ba. 2015. “Adam: A\nMethod for Stochastic Optimization.” In 3rd International\nConference on Learning Representations, ICLR 2015, San Diego, CA, USA,\nMay 7-9, 2015, Conference Track Proceedings, edited by Yoshua\nBengio and Yann LeCun. http://arxiv.org/abs/1412.6980.\n\n\nKirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness,\nGuillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al. 2017.\n“Overcoming Catastrophic Forgetting in Neural Networks.”\nProc. Natl. Acad. Sci. 114 (13): 3521–26. https://doi.org/10.1073/pnas.1611835114.\n\n\nKo, Yohan. 2021. “Characterizing System-Level Masking Effects\nAgainst Soft Errors.” Electronics 10 (18): 2286. https://doi.org/10.3390/electronics10182286.\n\n\nKocher, Paul, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss,\nWerner Haas, Mike Hamburg, et al. 2019a. “Spectre Attacks:\nExploiting Speculative Execution.” In 2019 IEEE\nSymposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\n\n———, et al. 2019b. “Spectre Attacks: Exploiting\nSpeculative Execution.” In 2019 IEEE Symposium on Security\nand Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\n\nKocher, Paul, Joshua Jaffe, and Benjamin Jun. 1999. “Differential\nPower Analysis.” In Advances in\nCryptologyCRYPTO’99: 19th Annual International Cryptology\nConference Santa Barbara, California, USA, August 1519,\n1999 Proceedings 19, 388–97. Springer.\n\n\nKocher, Paul, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. 2011.\n“Introduction to Differential Power Analysis.” Journal\nof Cryptographic Engineering 1 (1): 5–27. https://doi.org/10.1007/s13389-011-0006-y.\n\n\nKoh, Pang Wei, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma\nPierson, Been Kim, and Percy Liang. 2020. “Concept Bottleneck\nModels.” In Proceedings of the 37th International Conference\non Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event,\n119:5338–48. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v119/koh20a.html.\n\n\nKoh, Pang Wei, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin\nZhang, Akshay Balsubramani, Weihua Hu, et al. 2021.\n“WILDS: A Benchmark of in-the-Wild\nDistribution Shifts.” In Proceedings of the 38th\nInternational Conference on Machine Learning, ICML 2021, 18-24 July\n2021, Virtual Event, edited by Marina Meila and Tong Zhang,\n139:5637–64. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/koh21a.html.\n\n\nKoren, Yehuda, Robert Bell, and Chris Volinsky. 2009. “Matrix\nFactorization Techniques for Recommender Systems.”\nComputer 42 (8): 30–37. https://doi.org/10.1109/mc.2009.263.\n\n\nKrishna, Adithya, Srikanth Rohit Nudurupati, Chandana D G, Pritesh\nDwivedi, André van Schaik, Mahesh Mehendale, and Chetan Singh Thakur.\n2023. “RAMAN: A Re-Configurable and\nSparse TinyML Accelerator for Inference on Edge.” https://arxiv.org/abs/2306.06493.\n\n\nKrishnamoorthi. 2018. “Quantizing Deep Convolutional Networks for\nEfficient Inference: A Whitepaper.” ArXiv\nPreprint. https://arxiv.org/abs/1806.08342.\n\n\nKrishnan, Rayan, Pranav Rajpurkar, and Eric J. Topol. 2022.\n“Self-Supervised Learning in Medicine and Healthcare.”\nNat. Biomed. Eng. 6 (12): 1346–52. https://doi.org/10.1038/s41551-022-00914-1.\n\n\nKrishnan, Srivatsan, Natasha Jaques, Shayegan Omidshafiei, Dan Zhang,\nIzzeddin Gur, Vijay Janapa Reddi, and Aleksandra Faust. 2022.\n“Multi-Agent Reinforcement Learning for Microprocessor Design\nSpace Exploration.” https://arxiv.org/abs/2211.16385.\n\n\nKrishnan, Srivatsan, Amir Yazdanbakhsh, Shvetank Prakash, Jason Jabbour,\nIkechukwu Uchendu, Susobhan Ghosh, Behzad Boroujerdian, et al. 2023.\n“ArchGym: An Open-Source Gymnasium for\nMachine Learning Assisted Architecture Design.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture, 1–16. ACM. https://doi.org/10.1145/3579371.3589049.\n\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012a.\n“ImageNet Classification with Deep Convolutional Neural\nNetworks.” Edited by F. Pereira, C. J. Burges, L. Bottou, and K.\nQ. Weinberger 25. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.\n\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012b.\n“ImageNet Classification with Deep Convolutional\nNeural Networks.” In Advances in Neural Information\nProcessing Systems 25: 26th Annual Conference on Neural Information\nProcessing Systems 2012. Proceedings of a Meeting Held December 3-6,\n2012, Lake Tahoe, Nevada, United States, edited by Peter L.\nBartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou,\nand Kilian Q. Weinberger, 1106–14. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.\n\n\nKung, Hsiang Tsung, and Charles E Leiserson. 1979. “Systolic\nArrays (for VLSI).” In Sparse Matrix Proceedings\n1978, 1:256–82. Society for industrial; applied mathematics\nPhiladelphia, PA, USA.\n\n\nKurth, Thorsten, Shashank Subramanian, Peter Harrington, Jaideep Pathak,\nMorteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima\nAnandkumar. 2023. “FourCastNet:\nAccelerating Global High-Resolution Weather Forecasting\nUsing Adaptive Fourier Neural Operators.” In\nProceedings of the Platform for Advanced Scientific Computing\nConference, 1–11. ACM. https://doi.org/10.1145/3592979.3593412.\n\n\nKuzmin, Andrey, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters,\nand Tijmen Blankevoort. 2022. “FP8 Quantization:\nThe Power of the Exponent.” https://arxiv.org/abs/2208.09225.\n\n\nKwon, Jisu, and Daejin Park. 2021. “Hardware/Software\nCo-Design for TinyML Voice-Recognition Application on\nResource Frugal Edge Devices.” Applied Sciences 11 (22):\n11073. https://doi.org/10.3390/app112211073.\n\n\nKwon, Sun Hwa, and Lin Dong. 2022. “Flexible Sensors and Machine\nLearning for Heart Monitoring.” Nano Energy 102\n(November): 107632. https://doi.org/10.1016/j.nanoen.2022.107632.\n\n\nKwon, Young D, Rui Li, Stylianos I Venieris, Jagmohan Chauhan, Nicholas\nD Lane, and Cecilia Mascolo. 2023. “TinyTrain:\nDeep Neural Network Training at the Extreme Edge.”\nArXiv Preprint abs/2307.09988. https://arxiv.org/abs/2307.09988.\n\n\nLai, Liangzhen, Naveen Suda, and Vikas Chandra. 2018a. “Cmsis-Nn:\nEfficient Neural Network Kernels for Arm Cortex-m\nCpus.” ArXiv Preprint abs/1801.06601. https://arxiv.org/abs/1801.06601.\n\n\n———. 2018b. “CMSIS-NN:\nEfficient Neural Network Kernels for Arm Cortex-m\nCPUs.” https://arxiv.org/abs/1801.06601.\n\n\nLakkaraju, Himabindu, and Osbert Bastani. 2020. “”How\nDo i Fool You?”: Manipulating User Trust via Misleading Black Box\nExplanations.” In Proceedings of the AAAI/ACM Conference on\nAI, Ethics, and Society, 79–85. ACM. https://doi.org/10.1145/3375627.3375833.\n\n\nLam, Remi, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger,\nMeire Fortunato, Ferran Alet, Suman Ravuri, et al. 2023. “Learning\nSkillful Medium-Range Global Weather Forecasting.”\nScience 382 (6677): 1416–21. https://doi.org/10.1126/science.adi2336.\n\n\nLannelongue, Loı̈c, Jason Grealey, and Michael Inouye. 2021. “Green\nAlgorithms: Quantifying the Carbon Footprint of\nComputation.” Adv. Sci. 8 (12): 2100707. https://doi.org/10.1002/advs.202100707.\n\n\nLeCun, Yann, John Denker, and Sara Solla. 1989. “Optimal Brain\nDamage.” Adv Neural Inf Process Syst 2.\n\n\nLee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang,\nand Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed\nSignal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear\nExplosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970.\n\n\nLeRoy Poff, N, MM Brinson, and JW Day. 2002. “Aquatic Ecosystems\n& Global Climate Change.” Pew Center on Global Climate\nChange.\n\n\nLi, En, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. “Edge\nAI: On-demand Accelerating Deep\nNeural Network Inference via Edge Computing.” IEEE Trans.\nWireless Commun. 19 (1): 447–57. https://doi.org/10.1109/twc.2019.2946140.\n\n\nLi, Guanpeng, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai,\nKarthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017.\n“Understanding Error Propagation in Deep Learning Neural Network\n(DNN) Accelerators and Applications.” In\nProceedings of the International Conference for High Performance\nComputing, Networking, Storage and Analysis, 1–12. ACM. https://doi.org/10.1145/3126908.3126964.\n\n\nLi, Jingzhen, Igbe Tobore, Yuhang Liu, Abhishek Kandwal, Lei Wang, and\nZedong Nie. 2021. “Non-Invasive Monitoring of Three Glucose Ranges\nBased on ECG by Using\nDBSCAN-CNN.” #IEEE_J_BHI# 25\n(9): 3340–50. https://doi.org/10.1109/jbhi.2021.3072628.\n\n\nLi, Mu, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014.\n“Communication Efficient Distributed Machine Learning with the\nParameter Server.” In Advances in Neural Information\nProcessing Systems 27: Annual Conference on Neural Information\nProcessing Systems 2014, December 8-13 2014, Montreal, Quebec,\nCanada, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes,\nNeil D. Lawrence, and Kilian Q. Weinberger, 19–27. https://proceedings.neurips.cc/paper/2014/hash/1ff1de774005f8da13f42943881c655f-Abstract.html.\n\n\nLi, Qinbin, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu,\nand Bingsheng He. 2023. “A Survey on Federated Learning Systems:\nVision, Hype and Reality for Data Privacy and\nProtection.” IEEE Trans. Knowl. Data Eng. 35 (4):\n3347–66. https://doi.org/10.1109/tkde.2021.3124599.\n\n\nLi, Tian, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020.\n“Federated Learning: Challenges, Methods, and Future\nDirections.” IEEE Signal Process Mag. 37 (3): 50–60. https://doi.org/10.1109/msp.2020.2975749.\n\n\nLi, Xiang, Tao Qin, Jian Yang, and Tie-Yan Liu. 2016.\n“LightRNN: Memory and\nComputation-Efficient Recurrent Neural Networks.” In Advances\nin Neural Information Processing Systems 29: Annual Conference on Neural\nInformation Processing Systems 2016, December 5-10, 2016, Barcelona,\nSpain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von\nLuxburg, Isabelle Guyon, and Roman Garnett, 4385–93. https://proceedings.neurips.cc/paper/2016/hash/c3e4035af2a1cde9f21e1ae1951ac80b-Abstract.html.\n\n\nLi, Yuhang, Xin Dong, and Wei Wang. 2020. “Additive Powers-of-Two\nQuantization: An Efficient Non-Uniform Discretization for\nNeural Networks.” In 8th International Conference on Learning\nRepresentations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,\n2020. OpenReview.net. https://openreview.net/forum?id=BkgXT24tDS.\n\n\nLi, Zhizhong, and Derek Hoiem. 2018. “Learning Without\nForgetting.” IEEE Trans. Pattern Anal. Mach. Intell. 40\n(12): 2935–47. https://doi.org/10.1109/tpami.2017.2773081.\n\n\nLin, Ji, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han.\n2020. “MCUNet: Tiny Deep Learning on\nIoT Devices.” In Advances in Neural Information\nProcessing Systems 33: Annual Conference on Neural Information\nProcessing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html.\n\n\nLin, Ji, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song\nHan. 2022. “On-Device Training Under 256kb Memory.”\nAdv. Neur. In. 35: 22941–54.\n\n\nLin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona,\nDeva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014.\n“Microsoft Coco: Common Objects in Context.”\nIn Computer VisionECCV 2014: 13th European Conference,\nZurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13,\n740–55. Springer.\n\n\nLindgren, Simon. 2023. Handbook of Critical Studies of Artificial\nIntelligence. Edward Elgar Publishing.\n\n\nLindholm, Andreas, Dave Zachariah, Petre Stoica, and Thomas B. Schon.\n2019. “Data Consistency Approach to Model Validation.”\n#IEEE_O_ACC# 7: 59788–96. https://doi.org/10.1109/access.2019.2915109.\n\n\nLindholm, Erik, John Nickolls, Stuart Oberman, and John Montrym. 2008.\n“NVIDIA Tesla: A Unified Graphics and\nComputing Architecture.” IEEE Micro 28 (2): 39–55. https://doi.org/10.1109/mm.2008.31.\n\n\nLin, Tang Tang, Dang Yang, and Han Gan. 2023. “AWQ:\nActivation-aware Weight Quantization for\nLLM Compression and Acceleration.” ArXiv\nPreprint. https://arxiv.org/abs/2306.00978.\n\n\nLiu, Yanan, Xiaoxia Wei, Jinyu Xiao, Zhijie Liu, Yang Xu, and Yun Tian.\n2020. “Energy Consumption and Emission Mitigation Prediction Based\non Data Center Traffic and PUE for Global Data\nCenters.” Global Energy Interconnection 3 (3): 272–82.\nhttps://doi.org/10.1016/j.gloei.2020.07.008.\n\n\nLiu, Yingcheng, Guo Zhang, Christopher G. Tarolli, Rumen Hristov, Stella\nJensen-Roberts, Emma M. Waddell, Taylor L. Myers, et al. 2022.\n“Monitoring Gait at Home with Radio Waves in\nParkinson’s Disease: A Marker of Severity,\nProgression, and Medication Response.” Sci. Transl. Med.\n14 (663): eadc9669. https://doi.org/10.1126/scitranslmed.adc9669.\n\n\nLoh, Gabriel H. 2008. “3D-Stacked Memory\nArchitectures for Multi-Core Processors.” ACM SIGARCH\nComputer Architecture News 36 (3): 453–64. https://doi.org/10.1145/1394608.1382159.\n\n\nLopez-Paz, David, and Marc’Aurelio Ranzato. 2017. “Gradient\nEpisodic Memory for Continual Learning.” Adv Neural Inf\nProcess Syst 30.\n\n\nLou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013.\n“Accurate Intelligible Models with Pairwise Interactions.”\nIn Proceedings of the 19th ACM SIGKDD International Conference on\nKnowledge Discovery and Data Mining, edited by Inderjit S. Dhillon,\nYehuda Koren, Rayid Ghani, Ted E. Senator, Paul Bradley, Rajesh Parekh,\nJingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, 623–31. ACM. https://doi.org/10.1145/2487575.2487579.\n\n\nLowy, Andrew, Rakesh Pavan, Sina Baharlouei, Meisam Razaviyayn, and\nAhmad Beirami. 2021. “Fermi: Fair Empirical Risk\nMinimization via Exponential Rényi Mutual Information.”\n\n\nLuebke, David. 2008. “CUDA: Scalable\nParallel Programming for High-Performance Scientific Computing.”\nIn 2008 5th IEEE International Symposium on Biomedical Imaging: From\nNano to Macro, 836–38. IEEE. https://doi.org/10.1109/isbi.2008.4541126.\n\n\nLundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to\nInterpreting Model Predictions.” In Advances in Neural\nInformation Processing Systems 30: Annual Conference on Neural\nInformation Processing Systems 2017, December 4-9, 2017, Long Beach, CA,\nUSA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio,\nHanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett,\n4765–74. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.\n\n\nMa, Dongning, Fred Lin, Alban Desmaison, Joel Coburn, Daniel Moore,\nSriram Sankar, and Xun Jiao. 2024. “Dr.\nDNA: Combating Silent Data Corruptions in Deep\nLearning Using Distribution of Neuron Activations.” In\nProceedings of the 29th ACM International Conference on\nArchitectural Support for Programming Languages and Operating Systems,\nVolume 3, 239–52. ACM. https://doi.org/10.1145/3620666.3651349.\n\n\nMaas, Martin, David G. Andersen, Michael Isard, Mohammad Mahdi\nJavanmard, Kathryn S. McKinley, and Colin Raffel. 2024. “Combining\nMachine Learning and Lifetime-Based Resource Management for Memory\nAllocation and Beyond.” Commun. ACM 67 (4): 87–96. https://doi.org/10.1145/3611018.\n\n\nMaass, Wolfgang. 1997. “Networks of Spiking Neurons:\nThe Third Generation of Neural Network Models.”\nNeural Networks 10 (9): 1659–71. https://doi.org/10.1016/s0893-6080(97)00011-7.\n\n\nMadry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras,\nand Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to\nAdversarial Attacks.” arXiv Preprint arXiv:1706.06083.\n\n\nMahmoud, Abdulrahman, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez\nVicarte, Sarita V. Adve, Christopher W. Fletcher, Iuri Frosio, and Siva\nKumar Sastry Hari. 2020. “PyTorchFI: A\nRuntime Perturbation Tool for DNNs.” In 2020\n50th Annual IEEE/IFIP International Conference on Dependable Systems and\nNetworks Workshops (DSN-w), 25–31. IEEE; IEEE. https://doi.org/10.1109/dsn-w50199.2020.00014.\n\n\nMahmoud, Abdulrahman, Siva Kumar Sastry Hari, Christopher W. Fletcher,\nSarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael\nB. Sullivan, Timothy Tsai, and Stephen W. Keckler. 2021.\n“Optimizing Selective Protection for CNN\nResilience.” In 2021 IEEE 32nd International Symposium on\nSoftware Reliability Engineering (ISSRE), 127–38. IEEE. https://doi.org/10.1109/issre52982.2021.00025.\n\n\nMahmoud, Abdulrahman, Thierry Tambe, Tarek Aloui, David Brooks, and\nGu-Yeon Wei. 2022. “GoldenEye: A\nPlatform for Evaluating Emerging Numerical Data Formats in\nDNN Accelerators.” In 2022 52nd Annual IEEE/IFIP\nInternational Conference on Dependable Systems and Networks (DSN),\n206–14. IEEE. https://doi.org/10.1109/dsn53405.2022.00031.\n\n\nMarković, Danijela, Alice Mizrahi, Damien Querlioz, and Julie Grollier.\n2020. “Physics for Neuromorphic Computing.” Nature\nReviews Physics 2 (9): 499–510. https://doi.org/10.1038/s42254-020-0208-2.\n\n\nMartin, C. Dianne. 1993. “The Myth of the Awesome Thinking\nMachine.” Commun. ACM 36 (4): 120–33. https://doi.org/10.1145/255950.153587.\n\n\nMarulli, Fiammetta, Stefano Marrone, and Laura Verde. 2022.\n“Sensitivity of Machine Learning Approaches to Fake and Untrusted\nData in Healthcare Domain.” Journal of Sensor and Actuator\nNetworks 11 (2): 21. https://doi.org/10.3390/jsan11020021.\n\n\nMaslej, Nestor, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy,\nKatrina Ligett, Terah Lyons, James Manyika, et al. 2023.\n“Artificial Intelligence Index Report 2023.” ArXiv\nPreprint abs/2310.03715. https://arxiv.org/abs/2310.03715.\n\n\nMattson, Peter, Vijay Janapa Reddi, Christine Cheng, Cody Coleman, Greg\nDiamos, David Kanter, Paulius Micikevicius, et al. 2020a.\n“MLPerf: An Industry Standard Benchmark\nSuite for Machine Learning Performance.” IEEE Micro 40\n(2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\n———, et al. 2020b. “MLPerf: An Industry\nStandard Benchmark Suite for Machine Learning Performance.”\nIEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\nMazumder, Mark, Sharad Chitlangia, Colby Banbury, Yiping Kang, Juan\nManuel Ciro, Keith Achorn, Daniel Galvez, et al. 2021.\n“Multilingual Spoken Words Corpus.” In Thirty-Fifth\nConference on Neural Information Processing Systems Datasets and\nBenchmarks Track (Round 2).\n\n\nMcCarthy, John. 1981. “Epistemological Problems of Artificial\nIntelligence.” In Readings in Artificial Intelligence,\n459–65. Elsevier. https://doi.org/10.1016/b978-0-934613-03-3.50035-0.\n\n\nMcMahan, Brendan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise\nAgüera y Arcas. 2017. “Communication-Efficient Learning of Deep\nNetworks from Decentralized Data.” In Proceedings of the 20th\nInternational Conference on Artificial Intelligence and Statistics,\nAISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, edited by\nAarti Singh and Xiaojin (Jerry) Zhu, 54:1273–82. Proceedings of Machine\nLearning Research. PMLR. http://proceedings.mlr.press/v54/mcmahan17a.html.\n\n\nMiller, Charlie. 2019. “Lessons Learned from Hacking a\nCar.” IEEE Design &Amp; Test 36 (6): 7–9. https://doi.org/10.1109/mdat.2018.2863106.\n\n\nMiller, Charlie, and Chris Valasek. 2015. “Remote Exploitation of\nan Unaltered Passenger Vehicle.” Black Hat USA 2015 (S\n91): 1–91.\n\n\nMiller, D. A. B. 2000. “Optical Interconnects to Silicon.”\n#IEEE_J_JSTQE# 6 (6): 1312–17. https://doi.org/10.1109/2944.902184.\n\n\nMills, Andrew, and Stephen Le Hunte. 1997. “An Overview of\nSemiconductor Photocatalysis.” J. Photochem. Photobiol.,\nA 108 (1): 1–35. https://doi.org/10.1016/s1010-6030(97)00118-4.\n\n\nMirhoseini, Azalia, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang,\nEbrahim Songhori, Shen Wang, Young-Joon Lee, et al. 2021. “A Graph\nPlacement Methodology for Fast Chip Design.” Nature 594\n(7862): 207–12. https://doi.org/10.1038/s41586-021-03544-w.\n\n\nMishra, Asit K., Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan\nStosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. 2021.\n“Accelerating Sparse Deep Neural Networks.” CoRR\nabs/2104.08378. https://arxiv.org/abs/2104.08378.\n\n\nMittal, Sparsh, Gaurav Verma, Brajesh Kaushik, and Farooq A. Khanday.\n2021. “A Survey of SRAM-Based in-Memory Computing\nTechniques and Applications.” J. Syst. Architect. 119\n(October): 102276. https://doi.org/10.1016/j.sysarc.2021.102276.\n\n\nModha, Dharmendra S., Filipp Akopyan, Alexander Andreopoulos,\nRathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta,\net al. 2023. “Neural Inference at the Frontier of Energy, Space,\nand Time.” Science 382 (6668): 329–35. https://doi.org/10.1126/science.adh1174.\n\n\nMohanram, K., and N. A. Touba. 2003. “Partial Error Masking to\nReduce Soft Error Failure Rate in Logic Circuits.” In\nProceedings. 16th IEEE Symposium on Computer Arithmetic,\n433–40. IEEE; IEEE Comput. Soc. https://doi.org/10.1109/dftvs.2003.1250141.\n\n\nMonyei, Chukwuka G., and Kirsten E. H. Jenkins. 2018. “Electrons\nHave No Identity: Setting Right Misrepresentations in\nGoogle and Apple’s Clean Energy Purchasing.”\nEnergy Research &Amp; Social Science 46 (December): 48–51.\nhttps://doi.org/10.1016/j.erss.2018.06.015.\n\n\nMoshawrab, Mohammad, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim,\nand Ali Raad. 2023. “Reviewing Federated Learning Aggregation\nAlgorithms; Strategies, Contributions, Limitations and Future\nPerspectives.” Electronics 12 (10): 2287. https://doi.org/10.3390/electronics12102287.\n\n\nMukherjee, S. S., J. Emer, and S. K. Reinhardt. 2005. “The Soft\nError Problem: An Architectural Perspective.” In\n11th International Symposium on High-Performance Computer\nArchitecture, 243–47. IEEE; IEEE. https://doi.org/10.1109/hpca.2005.37.\n\n\nMunshi, Aaftab. 2009. “The OpenCL\nSpecification.” In 2009 IEEE Hot Chips 21 Symposium\n(HCS), 1–314. IEEE. https://doi.org/10.1109/hotchips.2009.7478342.\n\n\nMusk, Elon et al. 2019. “An Integrated Brain-Machine Interface\nPlatform with Thousands of Channels.” J. Med. Internet\nRes. 21 (10): e16194. https://doi.org/10.2196/16194.\n\n\nMyllyaho, Lalli, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen,\nand Tommi Mikkonen. 2022. “On Misbehaviour and Fault Tolerance in\nMachine Learning Systems.” J. Syst. Software 183\n(January): 111096. https://doi.org/10.1016/j.jss.2021.111096.\n\n\nNakano, Jane. 2021. The Geopolitics of Critical Minerals Supply\nChains. JSTOR.\n\n\nNarayanan, Arvind, and Vitaly Shmatikov. 2006. “How to Break\nAnonymity of the Netflix Prize Dataset.” arXiv Preprint\nCs/0610105.\n\n\nNg, Davy Tsz Kit, Jac Ka Lok Leung, Kai Wah Samuel Chu, and Maggie Shen\nQiao. 2021. “AI Literacy: Definition,\nTeaching, Evaluation and Ethical Issues.” Proceedings of the\nAssociation for Information Science and Technology 58 (1): 504–9.\n\n\nNgo, Richard, Lawrence Chan, and Sören Mindermann. 2022. “The\nAlignment Problem from a Deep Learning Perspective.” ArXiv\nPreprint abs/2209.00626. https://arxiv.org/abs/2209.00626.\n\n\nNguyen, Ngoc-Bao, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and\nNgai-Man Cheung. 2023. “Re-Thinking Model Inversion Attacks\nAgainst Deep Neural Networks.” In 2023 IEEE/CVF Conference on\nComputer Vision and Pattern Recognition (CVPR), 16384–93. IEEE. https://doi.org/10.1109/cvpr52729.2023.01572.\n\n\nNorrie, Thomas, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li,\nJames Laudon, Cliff Young, Norman Jouppi, and David Patterson. 2021.\n“The Design Process for Google’s Training Chips:\nTpuv2 and TPUv3.” IEEE Micro\n41 (2): 56–63. https://doi.org/10.1109/mm.2021.3058217.\n\n\nNorthcutt, Curtis G, Anish Athalye, and Jonas Mueller. 2021.\n“Pervasive Label Errors in Test Sets Destabilize Machine Learning\nBenchmarks.” arXiv. https://doi.org/https://doi.org/10.48550/arXiv.2103.14749\narXiv-issued DOI via DataCite.\n\n\nObermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil\nMullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used\nto Manage the Health of Populations.” Science 366\n(6464): 447–53. https://doi.org/10.1126/science.aax2342.\n\n\nOecd. 2023. “A Blueprint for Building National Compute Capacity\nfor Artificial Intelligence.” 350. Organisation for Economic\nCo-Operation; Development (OECD). https://doi.org/10.1787/876367e3-en.\n\n\nOlah, Chris, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael\nPetrov, and Shan Carter. 2020. “Zoom in: An\nIntroduction to Circuits.” Distill 5 (3): e00024–001. https://doi.org/10.23915/distill.00024.001.\n\n\nOliynyk, Daryna, Rudolf Mayer, and Andreas Rauber. 2023. “I Know\nWhat You Trained Last Summer: A Survey on Stealing Machine\nLearning Models and Defences.” ACM Comput. Surv. 55\n(14s): 1–41. https://doi.org/10.1145/3595292.\n\n\nOoko, Samson Otieno, Marvin Muyonga Ogore, Jimmy Nsenga, and Marco\nZennaro. 2021. “TinyML in Africa:\nOpportunities and Challenges.” In 2021 IEEE\nGlobecom Workshops (GC Wkshps), 1–6. IEEE; IEEE. https://doi.org/10.1109/gcwkshps52748.2021.9682107.\n\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022.\n“Poisoning Attacks Against Machine Learning: Can\nMachine Learning Be Trustworthy?” Computer 55 (11):\n94–99. https://doi.org/10.1109/mc.2022.3190787.\n\n\nPan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer\nLearning.” IEEE Trans. Knowl. Data Eng. 22 (10):\n1345–59. https://doi.org/10.1109/tkde.2009.191.\n\n\nPanda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy. 2019.\n“Discretization Based Solutions for Secure Machine Learning\nAgainst Adversarial Attacks.” #IEEE_O_ACC# 7: 70157–68.\nhttps://doi.org/10.1109/access.2019.2919463.\n\n\nPapadimitriou, George, and Dimitris Gizopoulos. 2021.\n“Demystifying the System Vulnerability Stack:\nTransient Fault Effects Across the Layers.” In\n2021 ACM/IEEE 48th Annual International Symposium on Computer\nArchitecture (ISCA), 902–15. IEEE; IEEE. https://doi.org/10.1109/isca52012.2021.00075.\n\n\nPapernot, Nicolas, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram\nSwami. 2016. “Distillation as a Defense to Adversarial\nPerturbations Against Deep Neural Networks.” In 2016 IEEE\nSymposium on Security and Privacy (SP), 582–97. IEEE; IEEE. https://doi.org/10.1109/sp.2016.41.\n\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max\nBartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler:\nA Data-Centric Challenge for Improving the Safety of\nText-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\n\nPatterson, David A, and John L Hennessy. 2016. Computer Organization\nand Design ARM Edition: The Hardware Software\nInterface. Morgan kaufmann.\n\n\nPatterson, David, Joseph Gonzalez, Urs Holzle, Quoc Le, Chen Liang,\nLluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and\nJeff Dean. 2022. “The Carbon Footprint of Machine Learning\nTraining Will Plateau, Then Shrink.” Computer 55 (7):\n18–28. https://doi.org/10.1109/mc.2022.3148714.\n\n\nPeters, Dorian, Rafael A. Calvo, and Richard M. Ryan. 2018.\n“Designing for Motivation, Engagement and Wellbeing in Digital\nExperience.” Front. Psychol. 9 (May): 797. https://doi.org/10.3389/fpsyg.2018.00797.\n\n\nPhillips, P Jonathon, Carina A Hahn, Peter C Fontana, David A\nBroniatowski, and Mark A Przybocki. 2020. “Four Principles of\nExplainable Artificial Intelligence.” Gaithersburg,\nMaryland 18.\n\n\nPlank, James S. 1997. “A Tutorial on\nReedSolomon Coding for Fault-Tolerance in\nRAID-Like Systems.” Software: Practice and\nExperience 27 (9): 995–1012.\n\n\nPont, Michael J, and Royan HL Ong. 2002. “Using Watchdog Timers to\nImprove the Reliability of Single-Processor Embedded Systems:\nSeven New Patterns and a Case Study.” In\nProceedings of the First Nordic Conference on Pattern Languages of\nPrograms, 159–200. Citeseer.\n\n\nPrakash, Shvetank, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan\nV. Green, Pete Warden, Tim Ansell, and Vijay Janapa Reddi. 2023.\n“CFU Playground: Full-stack Open-Source Framework for Tiny Machine\nLearning (TinyML) Acceleration on\nFPGAs.” In 2023 IEEE International Symposium on\nPerformance Analysis of Systems and Software (ISPASS). Vol.\nabs/2201.01863. IEEE. https://doi.org/10.1109/ispass57527.2023.00024.\n\n\nPrakash, Shvetank, Matthew Stewart, Colby Banbury, Mark Mazumder, Pete\nWarden, Brian Plancher, and Vijay Janapa Reddi. 2023. “Is\nTinyML Sustainable? Assessing the Environmental Impacts of\nMachine Learning on Microcontrollers.” ArXiv Preprint.\nhttps://arxiv.org/abs/2301.11899.\n\n\nPsoma, Sotiria D., and Chryso Kanthou. 2023. “Wearable Insulin\nBiosensors for Diabetes Management: Advances and\nChallenges.” Biosensors 13 (7): 719. https://doi.org/10.3390/bios13070719.\n\n\nPushkarna, Mahima, Andrew Zaldivar, and Oddur Kjartansson. 2022.\n“Data Cards: Purposeful and Transparent Dataset\nDocumentation for Responsible AI.” In 2022 ACM\nConference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3531146.3533231.\n\n\nPutnam, Andrew, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros\nConstantinides, John Demme, Hadi Esmaeilzadeh, et al. 2014. “A\nReconfigurable Fabric for Accelerating Large-Scale Datacenter\nServices.” ACM SIGARCH Computer Architecture News 42\n(3): 13–24. https://doi.org/10.1145/2678373.2665678.\n\n\nQi, Chen, Shibo Shen, Rongpeng Li, Zhifeng Zhao, Qing Liu, Jing Liang,\nand Honggang Zhang. 2021. “An Efficient Pruning Scheme of Deep\nNeural Networks for Internet of Things Applications.” EURASIP\nJournal on Advances in Signal Processing 2021 (1). https://doi.org/10.1186/s13634-021-00744-4.\n\n\nQian, Yu, Xuegong Zhou, Hao Zhou, and Lingli Wang. 2024. “An\nEfficient Reinforcement Learning Based Framework for Exploring Logic\nSynthesis.” ACM Trans. Des. Autom. Electron. Syst. 29\n(2): 1–33. https://doi.org/10.1145/3632174.\n\n\nR. V., Rashmi, and Karthikeyan A. 2018. “Secure Boot of Embedded\nApplications - a Review.” In 2018 Second International\nConference on Electronics, Communication and Aerospace Technology\n(ICECA), 291–98. IEEE. https://doi.org/10.1109/iceca.2018.8474730.\n\n\nRaina, Rajat, Anand Madhavan, and Andrew Y. Ng. 2009. “Large-Scale\nDeep Unsupervised Learning Using Graphics Processors.” In\nProceedings of the 26th Annual International Conference on Machine\nLearning, edited by Andrea Pohoreckyj Danyluk, Léon Bottou, and\nMichael L. Littman, 382:873–80. ACM International Conference Proceeding\nSeries. ACM. https://doi.org/10.1145/1553374.1553486.\n\n\nRamaswamy, Vikram V., Sunnie S. Y. Kim, Ruth Fong, and Olga Russakovsky.\n2023a. “Overlooked Factors in Concept-Based Explanations:\nDataset Choice, Concept Learnability, and Human\nCapability.” In 2023 IEEE/CVF Conference on Computer Vision\nand Pattern Recognition (CVPR), 10932–41. IEEE. https://doi.org/10.1109/cvpr52729.2023.01052.\n\n\nRamaswamy, Vikram V, Sunnie SY Kim, Ruth Fong, and Olga Russakovsky.\n2023b. “UFO: A Unified Method for\nControlling Understandability and Faithfulness Objectives in\nConcept-Based Explanations for CNNs.” ArXiv\nPreprint abs/2303.15632. https://arxiv.org/abs/2303.15632.\n\n\nRamcharan, Amanda, Kelsee Baranowski, Peter McCloskey, Babuali Ahmed,\nJames Legg, and David P. Hughes. 2017. “Deep Learning for\nImage-Based Cassava Disease Detection.” Front. Plant\nSci. 8 (October): 1852. https://doi.org/10.3389/fpls.2017.01852.\n\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss,\nAlec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot\nText-to-Image Generation.” In Proceedings of the 38th\nInternational Conference on Machine Learning, ICML 2021, 18-24 July\n2021, Virtual Event, edited by Marina Meila and Tong Zhang,\n139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\n\nRanganathan, Parthasarathy. 2011. “From Microprocessors to\nNanostores: Rethinking Data-Centric Systems.”\nComputer 44 (1): 39–48. https://doi.org/10.1109/mc.2011.18.\n\n\nRao, Ravi. 2021. Www.wevolver.com. https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies.\n\n\nRashid, Layali, Karthik Pattabiraman, and Sathish Gopalakrishnan. 2012.\n“Intermittent Hardware Errors Recovery: Modeling and\nEvaluation.” In 2012 Ninth International Conference on\nQuantitative Evaluation of Systems, 220–29. IEEE; IEEE. https://doi.org/10.1109/qest.2012.37.\n\n\n———. 2015. “Characterizing the Impact of Intermittent Hardware\nFaults on Programs.” IEEE Trans. Reliab. 64 (1):\n297–310. https://doi.org/10.1109/tr.2014.2363152.\n\n\nRatner, Alex, Braden Hancock, Jared Dunnmon, Roger Goldman, and\nChristopher Ré. 2018. “Snorkel MeTaL: Weak\nSupervision for Multi-Task Learning.” In Proceedings of the\nSecond Workshop on Data Management for End-to-End Machine Learning.\nACM. https://doi.org/10.1145/3209889.3209898.\n\n\nReagen, Brandon, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu\nLee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. “Ares:\nA Framework for Quantifying the Resilience of Deep Neural\nNetworks.” In 2018 55th ACM/ESDA/IEEE Design Automation\nConference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465834.\n\n\nReagen, Brandon, Jose Miguel Hernandez-Lobato, Robert Adolf, Michael\nGelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. “A\nCase for Efficient Accelerator Design Space Exploration via\nBayesian Optimization.” In 2017 IEEE/ACM\nInternational Symposium on Low Power Electronics and Design\n(ISLPED), 1–6. IEEE; IEEE. https://doi.org/10.1109/islped.2017.8009208.\n\n\nReddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson,\nGuenther Schmuelling, Carole-Jean Wu, Brian Anderson, et al. 2020.\n“MLPerf Inference Benchmark.” In 2020\nACM/IEEE 47th Annual International Symposium on Computer Architecture\n(ISCA), 446–59. IEEE; IEEE. https://doi.org/10.1109/isca45697.2020.00045.\n\n\nReddi, Vijay Janapa, and Meeta Sharma Gupta. 2013. Resilient\nArchitecture Design for Voltage Variation. Springer International\nPublishing. https://doi.org/10.1007/978-3-031-01739-1.\n\n\nReis, G. A., J. Chang, N. Vachharajani, R. Rangan, and D. I. August.\n2005. “SWIFT: Software Implemented Fault\nTolerance.” In International Symposium on Code Generation and\nOptimization, 243–54. IEEE; IEEE. https://doi.org/10.1109/cgo.2005.34.\n\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016.\n“” Why Should i Trust You?” Explaining\nthe Predictions of Any Classifier.” In Proceedings of the\n22nd ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, 1135–44.\n\n\nRobbins, Herbert, and Sutton Monro. 1951. “A Stochastic\nApproximation Method.” The Annals of Mathematical\nStatistics 22 (3): 400–407. https://doi.org/10.1214/aoms/1177729586.\n\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and\nBjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent\nDiffusion Models.” In 2022 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\n\n\nRosa, Gustavo H. de, and João P. Papa. 2021. “A Survey on Text\nGeneration Using Generative Adversarial Networks.” Pattern\nRecogn. 119 (November): 108098. https://doi.org/10.1016/j.patcog.2021.108098.\n\n\nRosenblatt, Frank. 1957. The Perceptron, a Perceiving and\nRecognizing Automaton Project Para. Cornell Aeronautical\nLaboratory.\n\n\nRoskies, Adina. 2002. “Neuroethics for the New Millenium.”\nNeuron 35 (1): 21–23. https://doi.org/10.1016/s0896-6273(02)00763-8.\n\n\nRuder, Sebastian. 2016. “An Overview of Gradient Descent\nOptimization Algorithms.” ArXiv Preprint abs/1609.04747.\nhttps://arxiv.org/abs/1609.04747.\n\n\nRudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning\nModels for High Stakes Decisions and Use Interpretable Models\nInstead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.\n\n\nRumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986.\n“Learning Representations by Back-Propagating Errors.”\nNature 323 (6088): 533–36. https://doi.org/10.1038/323533a0.\n\n\nRussakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh,\nSean Ma, Zhiheng Huang, et al. 2015. “ImageNet Large\nScale Visual Recognition Challenge.” Int. J. Comput.\nVision 115 (3): 211–52. https://doi.org/10.1007/s11263-015-0816-y.\n\n\nRussell, Stuart. 2021. “Human-Compatible Artificial\nIntelligence.” Human-Like Machine Intelligence, 3–23.\n\n\nRyan, Richard M., and Edward L. Deci. 2000. “Self-Determination\nTheory and the Facilitation of Intrinsic Motivation, Social Development,\nand Well-Being.” Am. Psychol. 55 (1): 68–78. https://doi.org/10.1037/0003-066x.55.1.68.\n\n\nSamajdar, Ananda, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar\nKrishna. 2018. “Scale-Sim: Systolic Cnn Accelerator\nSimulator.” ArXiv Preprint abs/1811.02883. https://arxiv.org/abs/1811.02883.\n\n\nSambasivan, Nithya, Shivani Kapania, Hannah Highfill, Diana Akrong,\nPraveen Paritosh, and Lora M Aroyo. 2021a.\n““Everyone Wants to Do the Model Work,\nNot the Data Work”: Data Cascades in\nHigh-Stakes AI.” In Proceedings of the 2021 CHI\nConference on Human Factors in Computing Systems, 1–15. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\n———. 2021b. ““Everyone Wants to Do the\nModel Work, Not the Data Work”: Data Cascades\nin High-Stakes AI.” In Proceedings of the 2021\nCHI Conference on Human Factors in Computing Systems. CHI ’21. New\nYork, NY, USA: ACM. https://doi.org/10.1145/3411764.3445518.\n\n\n———. 2021c. ““Everyone Wants to Do the\nModel Work, Not the Data Work”: Data Cascades\nin High-Stakes AI.” In Proceedings of the 2021\nCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\nSangchoolie, Behrooz, Karthik Pattabiraman, and Johan Karlsson. 2017.\n“One Bit Is (Not) Enough: An Empirical\nStudy of the Impact of Single and Multiple Bit-Flip Errors.” In\n2017 47th Annual IEEE/IFIP International Conference on Dependable\nSystems and Networks (DSN), 97–108. IEEE; IEEE. https://doi.org/10.1109/dsn.2017.30.\n\n\nSchäfer, Mike S. 2023. “The Notorious GPT:\nScience Communication in the Age of Artificial\nIntelligence.” Journal of Science Communication 22 (02):\nY02. https://doi.org/10.22323/2.22020402.\n\n\nSchizas, Nikolaos, Aristeidis Karras, Christos Karras, and Spyros\nSioutas. 2022. “TinyML for Ultra-Low Power\nAI and Large Scale IoT Deployments:\nA Systematic Review.” Future Internet 14\n(12): 363. https://doi.org/10.3390/fi14120363.\n\n\nSchuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker\nMitchell, Prasanna Date, and Bill Kay. 2022. “Opportunities for\nNeuromorphic Computing Algorithms and Applications.” Nature\nComputational Science 2 (1): 10–19. https://doi.org/10.1038/s43588-021-00184-y.\n\n\nSchwartz, Daniel, Jonathan Michael Gomes Selman, Peter Wrege, and\nAndreas Paepcke. 2021. “Deployment of Embedded\nEdge-AI for Wildlife Monitoring in Remote Regions.”\nIn 2021 20th IEEE International Conference on Machine Learning and\nApplications (ICMLA), 1035–42. IEEE; IEEE. https://doi.org/10.1109/icmla52953.2021.00170.\n\n\nSchwartz, Roy, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020.\n“Green AI.” Commun. ACM 63 (12):\n54–63. https://doi.org/10.1145/3381831.\n\n\nSegal, Mark, and Kurt Akeley. 1999. “The OpenGL\nGraphics System: A Specification (Version 1.1).”\n\n\nSegura Anaya, L. H., Abeer Alsadoon, N. Costadopoulos, and P. W. C.\nPrasad. 2017. “Ethical Implications of User Perceptions of\nWearable Devices.” Sci. Eng. Ethics 24 (1): 1–28. https://doi.org/10.1007/s11948-017-9872-8.\n\n\nSeide, Frank, and Amit Agarwal. 2016. “Cntk: Microsoft’s\nOpen-Source Deep-Learning Toolkit.” In Proceedings of the\n22nd ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, 2135–35. ACM. https://doi.org/10.1145/2939672.2945397.\n\n\nSelvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna\nVedantam, Devi Parikh, and Dhruv Batra. 2017.\n“Grad-CAM: Visual Explanations from Deep\nNetworks via Gradient-Based Localization.” In 2017 IEEE\nInternational Conference on Computer Vision (ICCV), 618–26. IEEE.\nhttps://doi.org/10.1109/iccv.2017.74.\n\n\nSeong, Nak Hee, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers,\nand Hsien-Hsin S. Lee. 2010. “SAFER: Stuck-at-fault Error Recovery for\nMemories.” In 2010 43rd Annual IEEE/ACM International\nSymposium on Microarchitecture, 115–24. IEEE; IEEE. https://doi.org/10.1109/micro.2010.46.\n\n\nSeyedzadeh, Saleh, Farzad Pour Rahimian, Ivan Glesk, and Marc Roper.\n2018. “Machine Learning for Estimation of Building Energy\nConsumption and Performance: A Review.”\nVisualization in Engineering 6 (1): 1–20. https://doi.org/10.1186/s40327-018-0064-7.\n\n\nShalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. 2017. “On\na Formal Model of Safe and Scalable Self-Driving Cars.” ArXiv\nPreprint abs/1708.06374. https://arxiv.org/abs/1708.06374.\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y\nZhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image\nGenerative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\nShastri, Bhavin J., Alexander N. Tait, T. Ferreira de Lima, Wolfram H.\nP. Pernice, Harish Bhaskaran, C. D. Wright, and Paul R. Prucnal. 2021.\n“Photonics for Artificial Intelligence and Neuromorphic\nComputing.” Nat. Photonics 15 (2): 102–14. https://doi.org/10.1038/s41566-020-00754-y.\n\n\nSheaffer, Jeremy W, David P Luebke, and Kevin Skadron. 2007. “A\nHardware Redundancy and Recovery Mechanism for Reliable Scientific\nComputation on Graphics Processors.” In Graphics\nHardware, 2007:55–64. Citeseer.\n\n\nShehabi, Arman, Sarah Smith, Dale Sartor, Richard Brown, Magnus Herrlin,\nJonathan Koomey, Eric Masanet, Nathaniel Horner, Inês Azevedo, and\nWilliam Lintner. 2016. “United States Data Center Energy Usage\nReport.”\n\n\nShen, Sheng, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami,\nMichael W. Mahoney, and Kurt Keutzer. 2020. “Q-BERT:\nHessian Based Ultra Low Precision Quantization of\nBERT.” Proceedings of the AAAI Conference on\nArtificial Intelligence 34 (05): 8815–21. https://doi.org/10.1609/aaai.v34i05.6409.\n\n\nSheng, Victor S., and Jing Zhang. 2019. “Machine Learning with\nCrowdsourcing: A Brief Summary of the Past Research and\nFuture Directions.” Proceedings of the AAAI Conference on\nArtificial Intelligence 33 (01): 9837–43. https://doi.org/10.1609/aaai.v33i01.33019837.\n\n\nShi, Hongrui, and Valentin Radu. 2022. “Data Selection for\nEfficient Model Update in Federated Learning.” In Proceedings\nof the 2nd European Workshop on Machine Learning and Systems,\n72–78. ACM. https://doi.org/10.1145/3517207.3526980.\n\n\nShneiderman, Ben. 2020. “Bridging the Gap Between Ethics and\nPractice: Guidelines for Reliable, Safe, and Trustworthy Human-Centered\nAI Systems.” ACM Trans. Interact. Intell. Syst. 10 (4):\n1–31. https://doi.org/10.1145/3419764.\n\n\n———. 2022. Human-Centered AI. Oxford University\nPress.\n\n\nShokri, Reza, Marco Stronati, Congzheng Song, and Vitaly Shmatikov.\n2017. “Membership Inference Attacks Against Machine Learning\nModels.” In 2017 IEEE Symposium on Security and Privacy\n(SP), 3–18. IEEE; IEEE. https://doi.org/10.1109/sp.2017.41.\n\n\nSiddik, Md Abu Bakar, Arman Shehabi, and Landon Marston. 2021.\n“The Environmental Footprint of Data Centers in the United\nStates.” Environ. Res. Lett. 16 (6): 064017. https://doi.org/10.1088/1748-9326/abfba1.\n\n\nSilvestro, Daniele, Stefano Goria, Thomas Sterner, and Alexandre\nAntonelli. 2022. “Improving Biodiversity Protection Through\nArtificial Intelligence.” Nature Sustainability 5 (5):\n415–24. https://doi.org/10.1038/s41893-022-00851-6.\n\n\nSingh, Narendra, and Oladele A. Ogunseitan. 2022. “Disentangling\nthe Worldwide Web of e-Waste and Climate Change Co-Benefits.”\nCircular Economy 1 (2): 100011. https://doi.org/10.1016/j.cec.2022.100011.\n\n\nSkorobogatov, Sergei. 2009. “Local Heating Attacks on Flash Memory\nDevices.” In 2009 IEEE International Workshop on\nHardware-Oriented Security and Trust, 1–6. IEEE; IEEE. https://doi.org/10.1109/hst.2009.5225028.\n\n\nSkorobogatov, Sergei P, and Ross J Anderson. 2003. “Optical Fault\nInduction Attacks.” In Cryptographic Hardware and Embedded\nSystems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA,\nAugust 1315, 2002 Revised Papers 4, 2–12. Springer.\n\n\nSmilkov, Daniel, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin\nWattenberg. 2017. “Smoothgrad: Removing Noise by\nAdding Noise.” ArXiv Preprint abs/1706.03825. https://arxiv.org/abs/1706.03825.\n\n\nSnoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012.\n“Practical Bayesian Optimization of Machine Learning\nAlgorithms.” In Advances in Neural Information Processing\nSystems 25: 26th Annual Conference on Neural Information Processing\nSystems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake\nTahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando\nC. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q.\nWeinberger, 2960–68. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.\n\n\nSrivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,\nand Ruslan Salakhutdinov. 2014. “Dropout: A Simple\nWay to Prevent Neural Networks from Overfitting.” J. Mach.\nLearn. Res. http://jmlr.org/papers/v15/srivastava14a.html.\n\n\nStm32L4Q5Ag. 2021. STMicroelectronics.\n\n\nStrubell, Emma, Ananya Ganesh, and Andrew McCallum. 2019. “Energy\nand Policy Considerations for Deep Learning in NLP.”\nIn Proceedings of the 57th Annual Meeting of the Association for\nComputational Linguistics, 3645–50. Florence, Italy: Association\nfor Computational Linguistics. https://doi.org/10.18653/v1/p19-1355.\n\n\nSuda, Naveen, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma,\nSarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016.\n“Throughput-Optimized OpenCL-Based FPGA\nAccelerator for Large-Scale Convolutional Neural Networks.” In\nProceedings of the 2016 ACM/SIGDA International Symposium on\nField-Programmable Gate Arrays, 16–25. ACM. https://doi.org/10.1145/2847263.2847276.\n\n\nSudhakar, Soumya, Vivienne Sze, and Sertac Karaman. 2023. “Data\nCenters on Wheels: Emissions from Computing Onboard\nAutonomous Vehicles.” IEEE Micro 43 (1): 29–39. https://doi.org/10.1109/mm.2022.3219803.\n\n\nSze, Vivienne, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017.\n“Efficient Processing of Deep Neural Networks: A\nTutorial and Survey.” Proc. IEEE 105 (12): 2295–2329. https://doi.org/10.1109/jproc.2017.2761740.\n\n\nSzegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna,\nDumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014.\n“Intriguing Properties of Neural Networks.” In 2nd\nInternational Conference on Learning Representations, ICLR 2014, Banff,\nAB, Canada, April 14-16, 2014, Conference Track Proceedings, edited\nby Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1312.6199.\n\n\nTambe, Thierry, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa\nReddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020.\n“Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings\nfor Resilient Deep Learning Inference.” In 2020 57th ACM/IEEE\nDesign Automation Conference (DAC), 1–6. IEEE; IEEE. https://doi.org/10.1109/dac18072.2020.9218516.\n\n\nTan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler,\nAndrew Howard, and Quoc V. Le. 2019. “MnasNet: Platform-aware Neural Architecture Search for\nMobile.” In 2019 IEEE/CVF Conference on Computer Vision and\nPattern Recognition (CVPR), 2820–28. IEEE. https://doi.org/10.1109/cvpr.2019.00293.\n\n\nTan, Mingxing, and Quoc V. Le. 2023. “Demystifying Deep\nLearning.” Wiley. https://doi.org/10.1002/9781394205639.ch6.\n\n\nTang, Xin, Yichun He, and Jia Liu. 2022. “Soft Bioelectronics for\nCardiac Interfaces.” Biophysics Reviews 3 (1). https://doi.org/10.1063/5.0069516.\n\n\nTang, Xin, Hao Shen, Siyuan Zhao, Na Li, and Jia Liu. 2023.\n“Flexible Braincomputer Interfaces.”\nNature Electronics 6 (2): 109–18. https://doi.org/10.1038/s41928-022-00913-9.\n\n\nTarun, Ayush K, Vikram S Chundawat, Murari Mandal, and Mohan\nKankanhalli. 2022. “Deep Regression Unlearning.” ArXiv\nPreprint abs/2210.08196. https://arxiv.org/abs/2210.08196.\n\n\nTeam, The Theano Development, Rami Al-Rfou, Guillaume Alain, Amjad\nAlmahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, et\nal. 2016. “Theano: A Python Framework for Fast\nComputation of Mathematical Expressions.” https://arxiv.org/abs/1605.02688.\n\n\n“The Ultimate Guide to Deep Learning Model Quantization and\nQuantization-Aware Training.” n.d. https://deci.ai/quantization-and-quantization-aware-training/.\n\n\nThompson, Neil C., Kristjan Greenewald, Keeheon Lee, and Gabriel F.\nManso. 2021. “Deep Learning’s Diminishing Returns:\nThe Cost of Improvement Is Becoming Unsustainable.”\nIEEE Spectr. 58 (10): 50–55. https://doi.org/10.1109/mspec.2021.9563954.\n\n\nTill, Aaron, Andrew L. Rypel, Andrew Bray, and Samuel B. Fey. 2019.\n“Fish Die-Offs Are Concurrent with Thermal Extremes in North\nTemperate Lakes.” Nat. Clim. Change 9 (8): 637–41. https://doi.org/10.1038/s41558-019-0520-y.\n\n\nTirtalistyani, Rose, Murtiningrum Murtiningrum, and Rameshwar S. Kanwar.\n2022. “Indonesia Rice Irrigation System:\nTime for Innovation.” Sustainability 14\n(19): 12477. https://doi.org/10.3390/su141912477.\n\n\nTokui, Seiya, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa,\nShunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki\nYamazaki Vincent. 2019. “Chainer: A Deep Learning Framework for\nAccelerating the Research Cycle.” In Proceedings of the 25th\nACM SIGKDD International Conference on Knowledge Discovery &Amp;\nData Mining, 5:1–6. ACM. https://doi.org/10.1145/3292500.3330756.\n\n\nTramèr, Florian, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan\nBoneh. 2019. “AdVersarial: Perceptual Ad Blocking\nMeets Adversarial Machine Learning.” In Proceedings of the\n2019 ACM SIGSAC Conference on Computer and Communications Security,\n2005–21. ACM. https://doi.org/10.1145/3319535.3354222.\n\n\nTsai, Min-Jen, Ping-Yi Lin, and Ming-En Lee. 2023. “Adversarial\nAttacks on Medical Image Classification.” Cancers 15\n(17): 4228. https://doi.org/10.3390/cancers15174228.\n\n\nTsai, Timothy, Siva Kumar Sastry Hari, Michael Sullivan, Oreste Villa,\nand Stephen W. Keckler. 2021. “NVBitFI:\nDynamic Fault Injection for GPUs.” In\n2021 51st Annual IEEE/IFIP International Conference on Dependable\nSystems and Networks (DSN), 284–91. IEEE; IEEE. https://doi.org/10.1109/dsn48987.2021.00041.\n\n\nUddin, Mueen, and Azizah Abdul Rahman. 2012. “Energy Efficiency\nand Low Carbon Enabler Green IT Framework for Data Centers\nConsidering Green Metrics.” Renewable Sustainable Energy\nRev. 16 (6): 4078–94. https://doi.org/10.1016/j.rser.2012.03.014.\n\n\nUn, and World Economic Forum. 2019. A New Circular Vision for\nElectronics, Time for a Global Reboot. PACE - Platform for\nAccelerating the Circular Economy. https://www3.weforum.org/docs/WEF\\_A\\_New\\_Circular\\_Vision\\_for\\_Electronics.pdf.\n\n\nValenzuela, Christine L, and Pearl Y Wang. 2000. “A Genetic\nAlgorithm for VLSI Floorplanning.” In Parallel\nProblem Solving from Nature PPSN VI: 6th International Conference Paris,\nFrance, September 1820, 2000 Proceedings 6, 671–80.\nSpringer.\n\n\nVan Noorden, Richard. 2016. “ArXiv Preprint Server\nPlans Multimillion-Dollar Overhaul.” Nature 534 (7609):\n602–2. https://doi.org/10.1038/534602a.\n\n\nVangal, Sriram, Somnath Paul, Steven Hsu, Amit Agarwal, Saurabh Kumar,\nRam Krishnamurthy, Harish Krishnamurthy, James Tschanz, Vivek De, and\nChris H. Kim. 2021. “Wide-Range Many-Core SoC Design\nin Scaled CMOS: Challenges and\nOpportunities.” IEEE Trans. Very Large Scale Integr. VLSI\nSyst. 29 (5): 843–56. https://doi.org/10.1109/tvlsi.2021.3061649.\n\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion\nJones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.\n“Attention Is All You Need.” Advances in Neural\nInformation Processing Systems 30.\n\n\n“Vector-Borne Diseases.” n.d.\nhttps://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases.\n\n\nVelazco, Raoul, Gilles Foucard, and Paul Peronnard. 2010.\n“Combining Results of Accelerated Radiation Tests and Fault\nInjections to Predict the Error Rate of an Application Implemented in\nSRAM-Based FPGAs.” IEEE Trans.\nNucl. Sci. 57 (6): 3500–3505. https://doi.org/10.1109/tns.2010.2087355.\n\n\nVerma, Naveen, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay,\nLung-Yen Chen, Bonan Zhang, and Peter Deaville. 2019. “In-Memory\nComputing: Advances and Prospects.” IEEE\nSolid-State Circuits Mag. 11 (3): 43–55. https://doi.org/10.1109/mssc.2019.2922889.\n\n\nVerma, Team Dual_Boot: Swapnil. 2022. “Elephant\nAI.” Hackster.io. https://www.hackster.io/dual\\_boot/elephant-ai-ba71e9.\n\n\nVinuesa, Ricardo, Hossein Azizpour, Iolanda Leite, Madeline Balaam,\nVirginia Dignum, Sami Domisch, Anna Felländer, Simone Daniela Langhans,\nMax Tegmark, and Francesco Fuso Nerini. 2020. “The Role of\nArtificial Intelligence in Achieving the Sustainable Development\nGoals.” Nat. Commun. 11 (1): 1–10. https://doi.org/10.1038/s41467-019-14108-y.\n\n\nVivet, Pascal, Eric Guthmuller, Yvain Thonnart, Gael Pillonnet, Cesar\nFuguet, Ivan Miro-Panades, Guillaume Moritz, et al. 2021.\n“IntAct: A 96-Core Processor with Six\nChiplets 3D-Stacked on an Active Interposer with\nDistributed Interconnects and Integrated Power Management.”\nIEEE J. Solid-State Circuits 56 (1): 79–97. https://doi.org/10.1109/jssc.2020.3036341.\n\n\nWachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017.\n“Counterfactual Explanations Without Opening the Black Box:\nAutomated Decisions and the GDPR.”\nSSRN Electronic Journal 31: 841. https://doi.org/10.2139/ssrn.3063289.\n\n\nWald, Peter H., and Jeffrey R. Jones. 1987. “Semiconductor\nManufacturing: An Introduction to Processes and\nHazards.” Am. J. Ind. Med. 11 (2): 203–21. https://doi.org/10.1002/ajim.4700110209.\n\n\nWan, Zishen, Aqeel Anwar, Yu-Shun Hsiao, Tianyu Jia, Vijay Janapa Reddi,\nand Arijit Raychowdhury. 2021. “Analyzing and Improving Fault\nTolerance of Learning-Based Navigation Systems.” In 2021 58th\nACM/IEEE Design Automation Conference (DAC), 841–46. IEEE; IEEE. https://doi.org/10.1109/dac18074.2021.9586116.\n\n\nWan, Zishen, Yiming Gan, Bo Yu, S Liu, A Raychowdhury, and Y Zhu. 2023.\n“Vpp: The Vulnerability-Proportional Protection\nParadigm Towards Reliable Autonomous Machines.” In\nProceedings of the 5th International Workshop on Domain Specific\nSystem Architecture (DOSSA), 1–6.\n\n\nWang, LingFeng, and YaQing Zhan. 2019a. “A Conceptual Peer Review\nModel for arXiv and Other Preprint\nDatabases.” Learn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\n\n———. 2019b. “A Conceptual Peer Review Model for arXiv and Other Preprint Databases.”\nLearn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\n\nWang, Tianzhe, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang,\nYujun Lin, and Song Han. 2020. “APQ:\nJoint Search for Network Architecture, Pruning and\nQuantization Policy.” In 2020 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR), 2075–84. IEEE. https://doi.org/10.1109/cvpr42600.2020.00215.\n\n\nWarden, Pete. 2018. “Speech Commands: A Dataset for\nLimited-Vocabulary Speech Recognition.” arXiv Preprint\narXiv:1804.03209.\n\n\nWarden, Pete, and Daniel Situnayake. 2019. Tinyml:\nMachine Learning with Tensorflow Lite on Arduino and\nUltra-Low-Power Microcontrollers. O’Reilly Media.\n\n\nWeik, Martin H. 1955. A Survey of Domestic Electronic Digital\nComputing Systems. Ballistic Research Laboratories.\n\n\nWeiser, Mark. 1991. “The Computer for the 21st Century.”\nSci. Am. 265 (3): 94–104. https://doi.org/10.1038/scientificamerican0991-94.\n\n\nWess, Matthias, Matvey Ivanov, Christoph Unger, and Anvesh Nookala.\n2020. “ANNETTE: Accurate Neural Network Execution Time Estimation\nwith Stacked Models.” IEEE. https://doi.org/10.1109/ACCESS.2020.3047259.\n\n\nWiener, Norbert. 1960. “Some Moral and Technical Consequences of\nAutomation: As Machines Learn They May Develop Unforeseen Strategies at\nRates That Baffle Their Programmers.” Science 131\n(3410): 1355–58. https://doi.org/10.1126/science.131.3410.1355.\n\n\nWilkening, Mark, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva\nGurumurthi, and David R. Kaeli. 2014. “Calculating Architectural\nVulnerability Factors for Spatial Multi-Bit Transient Faults.” In\n2014 47th Annual IEEE/ACM International Symposium on\nMicroarchitecture, 293–305. IEEE; IEEE. https://doi.org/10.1109/micro.2014.15.\n\n\nWinkler, Harald, Franck Lecocq, Hans Lofgren, Maria Virginia Vilariño,\nSivan Kartha, and Joana Portugal-Pereira. 2022. “Examples of\nShifting Development Pathways: Lessons on How to Enable\nBroader, Deeper, and Faster Climate Action.” Climate\nAction 1 (1). https://doi.org/10.1007/s44168-022-00026-1.\n\n\nWong, H.-S. Philip, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu,\nPang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai.\n2012. “MetalOxide\nRRAM.” Proc. IEEE 100 (6): 1951–70. https://doi.org/10.1109/jproc.2012.2190369.\n\n\nWu, Bichen, Kurt Keutzer, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang,\nFei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, and Yangqing Jia. 2019.\n“FBNet: Hardware-aware\nEfficient ConvNet Design via Differentiable Neural\nArchitecture Search.” In 2019 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR), 10734–42. IEEE. https://doi.org/10.1109/cvpr.2019.01099.\n\n\nWu, Carole-Jean, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha\nArdalani, Kiwan Maeng, Gloria Chang, et al. 2022. “Sustainable Ai:\nEnvironmental Implications, Challenges and\nOpportunities.” Proceedings of Machine Learning and\nSystems 4: 795–813.\n\n\nWu, Zhang Judd, and Micikevicius Isaev. 2020. “Integer\nQuantization for Deep Learning Inference: Principles and\nEmpirical Evaluation).” ArXiv Preprint. https://arxiv.org/abs/2004.09602.\n\n\nXiao, Seznec Lin, Demouth Wu, and Han. 2022.\n“SmoothQuant: Accurate and Efficient\nPost-Training Quantization for Large Language Models.” ArXiv\nPreprint. https://arxiv.org/abs/2211.10438.\n\n\nXie, Cihang, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and\nQuoc V. Le. 2020. “Adversarial Examples Improve Image\nRecognition.” In 2020 IEEE/CVF Conference on Computer Vision\nand Pattern Recognition (CVPR), 816–25. IEEE. https://doi.org/10.1109/cvpr42600.2020.00090.\n\n\nXie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He.\n2017. “Aggregated Residual Transformations for Deep Neural\nNetworks.” In 2017 IEEE Conference on Computer Vision and\nPattern Recognition (CVPR), 1492–1500. IEEE. https://doi.org/10.1109/cvpr.2017.634.\n\n\nXinyu, Chen. n.d.\n\n\nXiong, Siyu, Guoqing Wu, Xitian Fan, Xuan Feng, Zhongcheng Huang, Wei\nCao, Xuegong Zhou, et al. 2021. “MRI-Based Brain\nTumor Segmentation Using FPGA-Accelerated Neural\nNetwork.” BMC Bioinf. 22 (1): 421. https://doi.org/10.1186/s12859-021-04347-6.\n\n\nXiu, Liming. 2019. “Time Moore: Exploiting Moore’s Law from the Perspective of Time.”\nIEEE Solid-State Circuits Mag. 11 (1): 39–55. https://doi.org/10.1109/mssc.2018.2882285.\n\n\nXu, Chen, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong\nWang, and Hongbin Zha. 2018. “Alternating Multi-Bit Quantization\nfor Recurrent Neural Networks.” In 6th International\nConference on Learning Representations, ICLR 2018, Vancouver, BC,\nCanada, April 30 - May 3, 2018, Conference Track Proceedings.\nOpenReview.net. https://openreview.net/forum?id=S19dR9x0b.\n\n\nXu, Hu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes,\nVasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, and Christoph\nFeichtenhofer. 2023. “Demystifying CLIP Data.”\nArXiv Preprint abs/2309.16671. https://arxiv.org/abs/2309.16671.\n\n\nXu, Ying, Xu Zhong, Antonio Jimeno Yepes, and Jey Han Lau. 2021.\n“Grey-Box Adversarial Attack and Defence for\nSentiment Classification.” arXiv Preprint\narXiv:2103.11576.\n\n\nXu, Zheng, Yanxiang Zhang, Galen Andrew, Christopher A Choquette-Choo,\nPeter Kairouz, H Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang.\n2023. “Federated Learning of Gboard Language Models with\nDifferential Privacy.” ArXiv Preprint abs/2305.18465. https://arxiv.org/abs/2305.18465.\n\n\nYang, Tien-Ju, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv\nMathews, and Mingqing Chen. 2023. “Online Model Compression for\nFederated Learning with Large Models.” In ICASSP 2023 - 2023\nIEEE International Conference on Acoustics, Speech and Signal Processing\n(ICASSP), 1–5. IEEE; IEEE. https://doi.org/10.1109/icassp49357.2023.10097124.\n\n\nYe, Linfeng, and Shayan Mohajer Hamidi. 2021. “Thundernna:\nA White Box Adversarial Attack.” arXiv Preprint\narXiv:2111.12305.\n\n\nYeh, Y. C. 1996. “Triple-Triple Redundant 777 Primary Flight\nComputer.” In 1996 IEEE Aerospace Applications Conference.\nProceedings, 1:293–307. IEEE; IEEE. https://doi.org/10.1109/aero.1996.495891.\n\n\nYik, Jason, Soikat Hasan Ahmed, Zergham Ahmed, Brian Anderson, Andreas\nG. Andreou, Chiara Bartolozzi, Arindam Basu, et al. 2023.\n“NeuroBench: Advancing Neuromorphic\nComputing Through Collaborative, Fair and Representative\nBenchmarking.” https://arxiv.org/abs/2304.04640.\n\n\nYou, Jie, Jae-Won Chung, and Mosharaf Chowdhury. 2023. “Zeus:\nUnderstanding and Optimizing GPU Energy\nConsumption of DNN Training.” In 20th USENIX\nSymposium on Networked Systems Design and Implementation (NSDI 23),\n119–39. Boston, MA: USENIX Association. https://www.usenix.org/conference/nsdi23/presentation/you.\n\n\nYou, Yang, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer.\n2018. “ImageNet Training in Minutes.” https://arxiv.org/abs/1709.05011.\n\n\nYoung, Tom, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018.\n“Recent Trends in Deep Learning Based Natural Language Processing\n[Review Article].” IEEE Comput. Intell.\nMag. 13 (3): 55–75. https://doi.org/10.1109/mci.2018.2840738.\n\n\nYu, Yuan, Martı́n Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy\nDavis, Jeff Dean, et al. 2018. “Dynamic Control Flow in\nLarge-Scale Machine Learning.” In Proceedings of the\nThirteenth EuroSys Conference, 265–83. ACM. https://doi.org/10.1145/3190508.3190551.\n\n\nZafrir, Ofir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019.\n“Q8BERT: Quantized 8Bit\nBERT.” In 2019 Fifth Workshop on Energy\nEfficient Machine Learning and Cognitive Computing - NeurIPS Edition\n(EMC2-NIPS), 36–39. IEEE; IEEE. https://doi.org/10.1109/emc2-nips53020.2019.00016.\n\n\nZeiler, Matthew D. 2012. “Reinforcement and Systemic Machine\nLearning for Decision Making.” Wiley. https://doi.org/10.1002/9781118266502.ch6.\n\n\nZennaro, Marco, Brian Plancher, and V Janapa Reddi. 2022.\n“TinyML: Applied AI for\nDevelopment.” In The UN 7th Multi-Stakeholder Forum on\nScience, Technology and Innovation for the Sustainable Development\nGoals, 2022–05.\n\n\nZhang, Chen, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason\nOptimizing Cong. 2015. “FPGA-Based Accelerator Design\nfor Deep Convolutional Neural Networks Proceedings of the 2015\nACM.” In SIGDA International Symposium on\nField-Programmable Gate Arrays-FPGA, 15:161–70.\n\n\nZhang, Dan, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna\nGoldie, and Azalia Mirhoseini. 2022. “A Full-Stack Search\nTechnique for Domain Optimized Deep Learning Accelerators.” In\nProceedings of the 27th ACM International Conference on\nArchitectural Support for Programming Languages and Operating\nSystems, 27–42. ASPLOS ’22. New York, NY, USA: ACM. https://doi.org/10.1145/3503222.3507767.\n\n\nZhang, Dongxia, Xiaoqing Han, and Chunyu Deng. 2018. “Review on\nthe Research and Practice of Deep Learning and Reinforcement Learning in\nSmart Grids.” CSEE Journal of Power and Energy Systems 4\n(3): 362–70. https://doi.org/10.17775/cseejpes.2018.00520.\n\n\nZhang, Hongyu. 2008. “On the Distribution of Software\nFaults.” IEEE Trans. Software Eng. 34 (2): 301–2. https://doi.org/10.1109/tse.2007.70771.\n\n\nZhang, Jeff Jun, Tianyu Gu, Kanad Basu, and Siddharth Garg. 2018.\n“Analyzing and Mitigating the Impact of Permanent Faults on a\nSystolic Array Based Neural Network Accelerator.” In 2018\nIEEE 36th VLSI Test Symposium (VTS), 1–6. IEEE; IEEE. https://doi.org/10.1109/vts.2018.8368656.\n\n\nZhang, Jeff, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018.\n“ThUnderVolt: Enabling Aggressive\nVoltage Underscaling and Timing Error Resilience for Energy Efficient\nDeep Learning Accelerators.” In 2018 55th ACM/ESDA/IEEE\nDesign Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465918.\n\n\nZhang, Li Lyna, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, and Yunxin Liu.\n2020. “Fast Hardware-Aware Neural Architecture Search.” In\n2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition\nWorkshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw50498.2020.00354.\n\n\nZhang, Qingxue, Dian Zhou, and Xuan Zeng. 2017. “Highly Wearable\nCuff-Less Blood Pressure and Heart Rate Monitoring with Single-Arm\nElectrocardiogram and Photoplethysmogram Signals.” BioMedical\nEngineering OnLine 16 (1): 23. https://doi.org/10.1186/s12938-017-0317-z.\n\n\nZhang, Tunhou, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai\nHelen Li, and Yiran Chen. 2020. “AutoShrink:\nA Topology-Aware NAS for Discovering Efficient\nNeural Architecture.” In The Thirty-Fourth AAAI Conference on\nArtificial Intelligence, AAAI 2020, the Thirty-Second Innovative\nApplications of Artificial Intelligence Conference, IAAI 2020, the Tenth\nAAAI Symposium on Educational Advances in Artificial Intelligence, EAAI\n2020, New York, NY, USA, February 7-12, 2020, 6829–36. AAAI Press.\nhttps://aaai.org/ojs/index.php/AAAI/article/view/6163.\n\n\nZhao, Mark, and G. Edward Suh. 2018. “FPGA-Based\nRemote Power Side-Channel Attacks.” In 2018 IEEE Symposium on\nSecurity and Privacy (SP), 229–44. IEEE; IEEE. https://doi.org/10.1109/sp.2018.00049.\n\n\nZhao, Yue, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas\nChandra. 2018. “Federated Learning with Non-Iid Data.”\nArXiv Preprint abs/1806.00582. https://arxiv.org/abs/1806.00582.\n\n\nZhou, Bolei, Yiyou Sun, David Bau, and Antonio Torralba. 2018.\n“Interpretable Basis Decomposition for Visual Explanation.”\nIn Proceedings of the European Conference on Computer Vision\n(ECCV), 119–34.\n\n\nZhou, Chuteng, Fernando Garcia Redondo, Julian Büchel, Irem Boybat,\nXavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian,\nManuel Le Gallo, and Paul N. Whatmough. 2021.\n“AnalogNets: Ml-hw\nCo-Design of Noise-Robust TinyML Models and Always-on\nAnalog Compute-in-Memory Accelerator.” https://arxiv.org/abs/2111.06503.\n\n\nZhou, Guanglei, and Jason H. Anderson. 2023. “Area-Driven\nFPGA Logic Synthesis Using Reinforcement Learning.”\nIn Proceedings of the 28th Asia and South Pacific Design Automation\nConference, 159–65. ACM. https://doi.org/10.1145/3566097.3567894.\n\n\nZhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018.\n“Learning Rich Features for Image Manipulation Detection.”\nIn 2018 IEEE/CVF Conference on Computer Vision and Pattern\nRecognition, 1053–61. IEEE. https://doi.org/10.1109/cvpr.2018.00116.\n\n\nZhu, Hongyu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Anand\nJayarajan, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko.\n2018. “Benchmarking and Analyzing Deep Neural Network\nTraining.” In 2018 IEEE International Symposium on Workload\nCharacterization (IISWC), 88–100. IEEE; IEEE. https://doi.org/10.1109/iiswc.2018.8573476.\n\n\nZhu, Ligeng, Lanxiang Hu, Ji Lin, Wei-Ming Chen, Wei-Chen Wang, Chuang\nGan, and Song Han. 2023. “PockEngine:\nSparse and Efficient Fine-Tuning in a Pocket.” In\n56th Annual IEEE/ACM International Symposium on\nMicroarchitecture. ACM. https://doi.org/10.1145/3613424.3614307.\n\n\nZhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu\nZhu, Hui Xiong, and Qing He. 2021. “A Comprehensive Survey on\nTransfer Learning.” Proc. IEEE 109 (1): 43–76. https://doi.org/10.1109/jproc.2020.3004555.\n\n\nZoph, Barret, and Quoc V. Le. 2023. “Cybernetical\nIntelligence.” Wiley. https://doi.org/10.1002/9781394217519.ch17.", + "text": "Abadi, Martin, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya\nMironov, Kunal Talwar, and Li Zhang. 2016. “Deep Learning with\nDifferential Privacy.” In Proceedings of the 2016 ACM SIGSAC\nConference on Computer and Communications Security, 308–18. CCS\n’16. New York, NY, USA: ACM. https://doi.org/10.1145/2976749.2978318.\n\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi\nSchwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020.\n“Headless Horseman: Adversarial Attacks on Transfer\nLearning Models.” In ICASSP 2020 - 2020 IEEE International\nConference on Acoustics, Speech and Signal Processing (ICASSP),\n3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\n\nAddepalli, Sravanti, B. S. Vivek, Arya Baburaj, Gaurang Sriramanan, and\nR. Venkatesh Babu. 2020. “Towards Achieving Adversarial Robustness\nby Enforcing Feature Consistency Across Bit Planes.” In 2020\nIEEE/CVF Conference on Computer Vision and Pattern Recognition\n(CVPR), 1020–29. IEEE. https://doi.org/10.1109/cvpr42600.2020.00110.\n\n\nAdolf, Robert, Saketh Rama, Brandon Reagen, Gu-yeon Wei, and David\nBrooks. 2016. “Fathom: Reference Workloads for Modern\nDeep Learning Methods.” In 2016 IEEE International Symposium\non Workload Characterization (IISWC), 1–10. IEEE; IEEE. https://doi.org/10.1109/iiswc.2016.7581275.\n\n\nAgarwal, Alekh, Alina Beygelzimer, Miroslav Dudı́k, John Langford, and\nHanna M. Wallach. 2018. “A Reductions Approach to Fair\nClassification.” In Proceedings of the 35th International\nConference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm,\nSweden, July 10-15, 2018, edited by Jennifer G. Dy and Andreas\nKrause, 80:60–69. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/agarwal18a.html.\n\n\nAgnesina, Anthony, Puranjay Rajvanshi, Tian Yang, Geraldo Pradipta,\nAustin Jiao, Ben Keller, Brucek Khailany, and Haoxing Ren. 2023.\n“AutoDMP: Automated DREAMPlace-Based Macro\nPlacement.” In Proceedings of the 2023 International\nSymposium on Physical Design, 149–57. ACM. https://doi.org/10.1145/3569052.3578923.\n\n\nAgrawal, Dakshi, Selcuk Baktir, Deniz Karakoyunlu, Pankaj Rohatgi, and\nBerk Sunar. 2007. “Trojan Detection Using\nIC Fingerprinting.” In 2007 IEEE Symposium on\nSecurity and Privacy (SP ’07), 29–45. Springer; IEEE. https://doi.org/10.1109/sp.2007.36.\n\n\nAhmadilivani, Mohammad Hasan, Mahdi Taheri, Jaan Raik, Masoud\nDaneshtalab, and Maksim Jenihhin. 2024. “A Systematic Literature\nReview on Hardware Reliability Assessment Methods for Deep Neural\nNetworks.” ACM Comput. Surv. 56 (6): 1–39. https://doi.org/10.1145/3638242.\n\n\nAledhari, Mohammed, Rehma Razzak, Reza M. Parizi, and Fahad Saeed. 2020.\n“Federated Learning: A Survey on Enabling\nTechnologies, Protocols, and Applications.” #IEEE_O_ACC#\n8: 140699–725. https://doi.org/10.1109/access.2020.3013541.\n\n\nAlghamdi, Wael, Hsiang Hsu, Haewon Jeong, Hao Wang, Peter Michalak,\nShahab Asoodeh, and Flavio Calmon. 2022. “Beyond Adult and\nCOMPAS: Fair Multi-Class Prediction via\nInformation Projection.” Adv. Neur. In. 35: 38747–60.\n\n\nAltayeb, Moez, Marco Zennaro, and Marcelo Rovai. 2022.\n“Classifying Mosquito Wingbeat Sound Using\nTinyML.” In Proceedings of the 2022 ACM\nConference on Information Technology for Social Good, 132–37. ACM.\nhttps://doi.org/10.1145/3524458.3547258.\n\n\nAmiel, Frederic, Christophe Clavier, and Michael Tunstall. 2006.\n“Fault Analysis of DPA-Resistant Algorithms.”\nIn International Workshop on Fault Diagnosis and Tolerance in\nCryptography, 223–36. Springer.\n\n\nAnsel, Jason, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain,\nMichael Voznesensky, Bin Bao, et al. 2024. “PyTorch\n2: Faster Machine Learning Through Dynamic Python Bytecode\nTransformation and Graph Compilation.” In Proceedings of the\n29th ACM International Conference on Architectural Support for\nProgramming Languages and Operating Systems, Volume 2, edited by\nHanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence\nd’Alché-Buc, Emily B. Fox, and Roman Garnett, 8024–35. ACM. https://doi.org/10.1145/3620665.3640366.\n\n\nAnthony, Lasse F. Wolff, Benjamin Kanding, and Raghavendra Selvan. 2020.\nICML Workshop on Challenges in Deploying and monitoring Machine Learning\nSystems.\n\n\nAntol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv\nBatra, C. Lawrence Zitnick, and Devi Parikh. 2015.\n“VQA: Visual Question Answering.”\nIn 2015 IEEE International Conference on Computer Vision\n(ICCV), 2425–33. IEEE. https://doi.org/10.1109/iccv.2015.279.\n\n\nAntonakakis, Manos, Tim April, Michael Bailey, Matt Bernhard, Elie\nBursztein, Jaime Cochran, Zakir Durumeric, et al. 2017.\n“Understanding the Mirai Botnet.” In 26th USENIX\nSecurity Symposium (USENIX Security 17), 1093–1110.\n\n\nArdila, Rosana, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer,\nMichael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and\nGregor Weber. 2020. “Common Voice: A\nMassively-Multilingual Speech Corpus.” In Proceedings of the\nTwelfth Language Resources and Evaluation Conference, 4218–22.\nMarseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.520.\n\n\nArifeen, Tooba, Abdus Sami Hassan, and Jeong-A Lee. 2020.\n“Approximate Triple Modular Redundancy: A\nSurvey.” #IEEE_O_ACC# 8: 139851–67. https://doi.org/10.1109/access.2020.3012673.\n\n\nAsonov, D., and R. Agrawal. 2004. “Keyboard Acoustic\nEmanations.” In IEEE Symposium on Security and Privacy, 2004.\nProceedings. 2004, 3–11. IEEE; IEEE. https://doi.org/10.1109/secpri.2004.1301311.\n\n\nAteniese, Giuseppe, Luigi V. Mancini, Angelo Spognardi, Antonio Villani,\nDomenico Vitali, and Giovanni Felici. 2015. “Hacking Smart\nMachines with Smarter Ones: How to Extract Meaningful Data\nfrom Machine Learning Classifiers.” Int. J. Secur. Netw.\n10 (3): 137. https://doi.org/10.1504/ijsn.2015.071829.\n\n\nAttia, Zachi I., Alan Sugrue, Samuel J. Asirvatham, Michael J. Ackerman,\nSuraj Kapa, Paul A. Friedman, and Peter A. Noseworthy. 2018.\n“Noninvasive Assessment of Dofetilide Plasma Concentration Using a\nDeep Learning (Neural Network) Analysis of the Surface\nElectrocardiogram: A Proof of Concept Study.”\nPLoS One 13 (8): e0201059. https://doi.org/10.1371/journal.pone.0201059.\n\n\nAygun, Sercan, Ece Olcay Gunes, and Christophe De Vleeschouwer. 2021.\n“Efficient and Robust Bitstream Processing in Binarised Neural\nNetworks.” Electron. Lett. 57 (5): 219–22. https://doi.org/10.1049/ell2.12045.\n\n\nBai, Tao, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021.\n“Recent Advances in Adversarial Training for Adversarial\nRobustness.” arXiv Preprint arXiv:2102.01356.\n\n\nBains, Sunny. 2020. “The Business of Building Brains.”\nNature Electronics 3 (7): 348–51. https://doi.org/10.1038/s41928-020-0449-1.\n\n\nBamoumen, Hatim, Anas Temouden, Nabil Benamar, and Yousra Chtouki. 2022.\n“How TinyML Can Be Leveraged to Solve Environmental\nProblems: A Survey.” In 2022 International\nConference on Innovation and Intelligence for Informatics, Computing,\nand Technologies (3ICT), 338–43. IEEE; IEEE. https://doi.org/10.1109/3ict56508.2022.9990661.\n\n\nBank, Dor, Noam Koenigstein, and Raja Giryes. 2023.\n“Autoencoders.” Machine Learning for Data Science\nHandbook: Data Mining and Knowledge Discovery Handbook, 353–74.\n\n\nBannon, Pete, Ganesh Venkataramanan, Debjit Das Sarma, and Emil Talpes.\n2019. “Computer and Redundancy Solution for the Full Self-Driving\nComputer.” In 2019 IEEE Hot Chips 31 Symposium (HCS),\n1–22. IEEE Computer Society; IEEE. https://doi.org/10.1109/hotchips.2019.8875645.\n\n\nBarenghi, Alessandro, Guido M. Bertoni, Luca Breveglieri, Mauro\nPellicioli, and Gerardo Pelosi. 2010. “Low Voltage Fault Attacks\nto AES.” In 2010 IEEE International Symposium on\nHardware-Oriented Security and Trust (HOST), 7–12. IEEE; IEEE. https://doi.org/10.1109/hst.2010.5513121.\n\n\nBarroso, Luiz André, Urs Hölzle, and Parthasarathy Ranganathan. 2019.\nThe Datacenter as a Computer: Designing Warehouse-Scale\nMachines. Springer International Publishing. https://doi.org/10.1007/978-3-031-01761-2.\n\n\nBau, David, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba.\n2017. “Network Dissection: Quantifying\nInterpretability of Deep Visual Representations.” In 2017\nIEEE Conference on Computer Vision and Pattern Recognition (CVPR),\n3319–27. IEEE. https://doi.org/10.1109/cvpr.2017.354.\n\n\nBeaton, Albert E., and John W. Tukey. 1974. “The Fitting of Power\nSeries, Meaning Polynomials, Illustrated on Band-Spectroscopic\nData.” Technometrics 16 (2): 147. https://doi.org/10.2307/1267936.\n\n\nBeck, Nathaniel, and Simon Jackman. 1998. “Beyond Linearity by\nDefault: Generalized Additive Models.” Am. J.\nPolit. Sci. 42 (2): 596. https://doi.org/10.2307/2991772.\n\n\nBender, Emily M., and Batya Friedman. 2018. “Data Statements for\nNatural Language Processing: Toward Mitigating System Bias\nand Enabling Better Science.” Transactions of the Association\nfor Computational Linguistics 6 (December): 587–604. https://doi.org/10.1162/tacl_a_00041.\n\n\nBerger, Vance W, and YanYan Zhou. 2014.\n“Kolmogorovsmirnov Test:\nOverview.” Wiley Statsref: Statistics Reference\nOnline.\n\n\nBeyer, Lucas, Olivier J Hénaff, Alexander Kolesnikov, Xiaohua Zhai, and\nAäron van den Oord. 2020. “Are We Done with Imagenet?”\nArXiv Preprint abs/2006.07159. https://arxiv.org/abs/2006.07159.\n\n\nBhagoji, Arjun Nitin, Warren He, Bo Li, and Dawn Song. 2018.\n“Practical Black-Box Attacks on Deep Neural Networks Using\nEfficient Query Mechanisms.” In Proceedings of the European\nConference on Computer Vision (ECCV), 154–69.\n\n\nBhardwaj, Kshitij, Marton Havasi, Yuan Yao, David M. Brooks, José Miguel\nHernández-Lobato, and Gu-Yeon Wei. 2020. “A Comprehensive\nMethodology to Determine Optimal Coherence Interfaces for\nMany-Accelerator SoCs.” In Proceedings of the\nACM/IEEE International Symposium on Low Power Electronics and\nDesign, 145–50. ACM. https://doi.org/10.1145/3370748.3406564.\n\n\nBiega, Asia J., Peter Potash, Hal Daumé, Fernando Diaz, and Michèle\nFinck. 2020. “Operationalizing the Legal Principle of Data\nMinimization for Personalization.” In Proceedings of the 43rd\nInternational ACM SIGIR Conference on Research and Development in\nInformation Retrieval, edited by Jimmy Huang, Yi Chang, Xueqi\nCheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun Liu, 399–408.\nACM. https://doi.org/10.1145/3397271.3401034.\n\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012.\n“Poisoning Attacks Against Support Vector Machines.” In\nProceedings of the 29th International Conference on Machine\nLearning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1,\n2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\n\n\nBiggs, John, James Myers, Jedrzej Kufel, Emre Ozer, Simon Craske, Antony\nSou, Catherine Ramsdale, Ken Williamson, Richard Price, and Scott White.\n2021. “A Natively Flexible 32-Bit Arm Microprocessor.”\nNature 595 (7868): 532–36. https://doi.org/10.1038/s41586-021-03625-w.\n\n\nBinkert, Nathan, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt,\nAli Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. “The Gem5\nSimulator.” ACM SIGARCH Computer Architecture News 39\n(2): 1–7. https://doi.org/10.1145/2024716.2024718.\n\n\nBohr, Adam, and Kaveh Memarzadeh. 2020. “The Rise of Artificial\nIntelligence in Healthcare Applications.” In Artificial\nIntelligence in Healthcare, 25–60. Elsevier. https://doi.org/10.1016/b978-0-12-818438-7.00002-2.\n\n\nBolchini, Cristiana, Luca Cassano, Antonio Miele, and Alessandro Toschi.\n2023. “Fast and Accurate Error Simulation for CNNs\nAgainst Soft Errors.” IEEE Trans. Comput. 72 (4):\n984–97. https://doi.org/10.1109/tc.2022.3184274.\n\n\nBondi, Elizabeth, Ashish Kapoor, Debadeepta Dey, James Piavis, Shital\nShah, Robert Hannaford, Arvind Iyer, Lucas Joppa, and Milind Tambe.\n2018. “Near Real-Time Detection of Poachers from Drones in\nAirSim.” In Proceedings of the Twenty-Seventh\nInternational Joint Conference on Artificial Intelligence, edited\nby Jérôme Lang, 5814–16. International Joint Conferences on Artificial\nIntelligence Organization. https://doi.org/10.24963/ijcai.2018/847.\n\n\nBourtoule, Lucas, Varun Chandrasekaran, Christopher A. Choquette-Choo,\nHengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas\nPapernot. 2021. “Machine Unlearning.” In 2021 IEEE\nSymposium on Security and Privacy (SP), 141–59. IEEE; IEEE. https://doi.org/10.1109/sp40001.2021.00019.\n\n\nBreier, Jakub, Xiaolu Hou, Dirmanto Jap, Lei Ma, Shivam Bhasin, and Yang\nLiu. 2018. “Deeplaser: Practical Fault Attack on Deep\nNeural Networks.” ArXiv Preprint abs/1806.05859. https://arxiv.org/abs/1806.05859.\n\n\nBrown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan,\nPrafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language\nModels Are Few-Shot Learners.” In Advances in Neural\nInformation Processing Systems 33: Annual Conference on Neural\nInformation Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.\n\n\nBuolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades:\nIntersectional Accuracy Disparities in Commercial Gender\nClassification.” In Conference on Fairness, Accountability\nand Transparency, 77–91. PMLR.\n\n\nBurnet, David, and Richard Thomas. 1989. “Spycatcher:\nThe Commodification of Truth.” J. Law Soc.\n16 (2): 210. https://doi.org/10.2307/1410360.\n\n\nBurr, Geoffrey W., Matthew J. BrightSky, Abu Sebastian, Huai-Yu Cheng,\nJau-Yi Wu, Sangbum Kim, Norma E. Sosa, et al. 2016. “Recent\nProgress in Phase-Change<?Pub _Newline ?>Memory Technology.”\nIEEE Journal on Emerging and Selected Topics in Circuits and\nSystems 6 (2): 146–62. https://doi.org/10.1109/jetcas.2016.2547718.\n\n\nBushnell, Michael L, and Vishwani D Agrawal. 2002. “Built-in\nSelf-Test.” Essentials of Electronic Testing for Digital,\nMemory and Mixed-Signal VLSI Circuits, 489–548.\n\n\nBuyya, Rajkumar, Anton Beloglazov, and Jemal Abawajy. 2010.\n“Energy-Efficient Management of Data Center Resources for Cloud\nComputing: A Vision, Architectural Elements, and Open\nChallenges.” https://arxiv.org/abs/1006.0308.\n\n\nCai, Carrie J., Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel\nSmilkov, Martin Wattenberg, et al. 2019. “Human-Centered Tools for\nCoping with Imperfect Algorithms During Medical Decision-Making.”\nIn Proceedings of the 2019 CHI Conference on Human Factors in\nComputing Systems, edited by Jennifer G. Dy and Andreas Krause,\n80:2673–82. Proceedings of Machine Learning Research. ACM. https://doi.org/10.1145/3290605.3300234.\n\n\nCai, Han, Chuang Gan, Ligeng Zhu, and Song Han. 2020.\n“TinyTL: Reduce Memory, Not Parameters\nfor Efficient on-Device Learning.” In Advances in Neural\nInformation Processing Systems 33: Annual Conference on Neural\nInformation Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/81f7acabd411274fcf65ce2070ed568a-Abstract.html.\n\n\nCai, Han, Ligeng Zhu, and Song Han. 2019.\n“ProxylessNAS: Direct Neural\nArchitecture Search on Target Task and Hardware.” In 7th\nInternational Conference on Learning Representations, ICLR 2019, New\nOrleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=HylVB3AqYm.\n\n\nCalvo, Rafael A, Dorian Peters, Karina Vold, and Richard M Ryan. 2020.\n“Supporting Human Autonomy in AI Systems:\nA Framework for Ethical Enquiry.” Ethics of\nDigital Well-Being: A Multidisciplinary Approach, 31–54.\n\n\nCarlini, Nicholas, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah\nSherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. “Hidden\nVoice Commands.” In 25th USENIX Security Symposium (USENIX\nSecurity 16), 513–30.\n\n\nCarta, Salvatore, Alessandro Sebastian Podda, Diego Reforgiato Recupero,\nand Roberto Saia. 2020. “A Local Feature Engineering Strategy to\nImprove Network Anomaly Detection.” Future Internet 12\n(10): 177. https://doi.org/10.3390/fi12100177.\n\n\nCavoukian, Ann. 2009. “Privacy by Design.” Office of\nthe Information and Privacy Commissioner.\n\n\nCenci, Marcelo Pilotto, Tatiana Scarazzato, Daniel Dotto Munchen, Paula\nCristina Dartora, Hugo Marcelo Veit, Andrea Moura Bernardes, and Pablo\nR. Dias. 2021. “Eco-Friendly\nElectronicsA Comprehensive Review.”\nAdv. Mater. Technol. 7 (2): 2001263. https://doi.org/10.1002/admt.202001263.\n\n\nChallenge, WEF Net-Zero. 2021. “The Supply Chain\nOpportunity.” In World Economic Forum: Geneva,\nSwitzerland.\n\n\nChandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly\nDetection: A Survey.” ACM Comput. Surv. 41 (3): 1–58. https://doi.org/10.1145/1541880.1541882.\n\n\nChapelle, O., B. Scholkopf, and A. Zien Eds. 2009.\n“Semi-Supervised Learning (Chapelle, O.\nEt Al., Eds.; 2006) [Book Reviews].” IEEE Trans.\nNeural Networks 20 (3): 542–42. https://doi.org/10.1109/tnn.2009.2015974.\n\n\nChen, Chaofan, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and\nJonathan Su. 2019. “This Looks Like That: Deep\nLearning for Interpretable Image Recognition.” In Advances in\nNeural Information Processing Systems 32: Annual Conference on Neural\nInformation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019,\nVancouver, BC, Canada, edited by Hanna M. Wallach, Hugo Larochelle,\nAlina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman\nGarnett, 8928–39. https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html.\n\n\nChen, Emma, Shvetank Prakash, Vijay Janapa Reddi, David Kim, and Pranav\nRajpurkar. 2023. “A Framework for Integrating Artificial\nIntelligence for Clinical Care with Continuous Therapeutic\nMonitoring.” Nat. Biomed. Eng., November. https://doi.org/10.1038/s41551-023-01115-0.\n\n\nChen, H.-W. 2006. “Gallium, Indium, and Arsenic Pollution of\nGroundwater from a Semiconductor Manufacturing Area of\nTaiwan.” B. Environ. Contam. Tox. 77 (2):\n289–96. https://doi.org/10.1007/s00128-006-1062-3.\n\n\nChen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan,\nHaichen Shen, Meghan Cowan, et al. 2018. “TVM:\nAn Automated End-to-End Optimizing Compiler for Deep\nLearning.” In 13th USENIX Symposium on Operating Systems\nDesign and Implementation (OSDI 18), 578–94.\n\n\nChen, Tianqi, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016.\n“Training Deep Nets with Sublinear Memory Cost.” ArXiv\nPreprint abs/1604.06174. https://arxiv.org/abs/1604.06174.\n\n\nChen, Zhiyong, and Shugong Xu. 2023. “Learning\nDomain-Heterogeneous Speaker Recognition Systems with Personalized\nContinual Federated Learning.” EURASIP Journal on Audio,\nSpeech, and Music Processing 2023 (1): 33. https://doi.org/10.1186/s13636-023-00299-2.\n\n\nChen, Zitao, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben.\n2019. “<i>BinFI</i>: An Efficient Fault Injector for\nSafety-Critical Machine Learning Systems.” In Proceedings of\nthe International Conference for High Performance Computing, Networking,\nStorage and Analysis. SC ’19. New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356177.\n\n\nChen, Zitao, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik\nPattabiraman, and Nathan DeBardeleben. 2020.\n“TensorFI: A Flexible Fault Injection\nFramework for TensorFlow Applications.” In 2020\nIEEE 31st International Symposium on Software Reliability Engineering\n(ISSRE), 426–35. IEEE; IEEE. https://doi.org/10.1109/issre5003.2020.00047.\n\n\nCheng, Eric, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher,\nHyungmin Cho, Kevin Skadron, Mircea R. Stan, et al. 2016. “Clear:\n<U>c</u>\nRoss <u>-l</u>\nAyer <u>e</u>\nXploration for <u>a</u>\nRchitecting <u>r</u>\nEsilience - Combining Hardware and Software Techniques to Tolerate Soft\nErrors in Processor Cores.” In Proceedings of the 53rd Annual\nDesign Automation Conference, 1–6. ACM. https://doi.org/10.1145/2897937.2897996.\n\n\nCheng, Yu, Duo Wang, Pan Zhou, and Tao Zhang. 2018. “Model\nCompression and Acceleration for Deep Neural Networks: The\nPrinciples, Progress, and Challenges.” IEEE Signal Process\nMag. 35 (1): 126–36. https://doi.org/10.1109/msp.2017.2765695.\n\n\nChi, Ping, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu,\nYu Wang, and Yuan Xie. 2016. “Prime: A Novel Processing-in-Memory\nArchitecture for Neural Network Computation in ReRAM-Based Main\nMemory.” ACM SIGARCH Computer Architecture News 44 (3):\n27–39. https://doi.org/10.1145/3007787.3001140.\n\n\nChollet, François. 2018. “Introduction to Keras.” March\n9th.\n\n\nChristiano, Paul F., Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg,\nand Dario Amodei. 2017. “Deep Reinforcement Learning from Human\nPreferences.” In Advances in Neural Information Processing\nSystems 30: Annual Conference on Neural Information Processing Systems\n2017, December 4-9, 2017, Long Beach, CA, USA, edited by Isabelle\nGuyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S.\nV. N. Vishwanathan, and Roman Garnett, 4299–4307. https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html.\n\n\nChu, Grace, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton,\nPieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, and Andrew\nHoward. 2021. “Discovering Multi-Hardware Mobile Models via\nArchitecture Search.” In 2021 IEEE/CVF Conference on Computer\nVision and Pattern Recognition Workshops (CVPRW), 3022–31. IEEE. https://doi.org/10.1109/cvprw53098.2021.00337.\n\n\nChua, L. 1971. “Memristor-the Missing Circuit Element.”\n#IEEE_J_CT# 18 (5): 507–19. https://doi.org/10.1109/tct.1971.1083337.\n\n\nChung, Jae-Won, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and\nMosharaf Chowdhury. 2023. “Perseus: Removing Energy\nBloat from Large Model Training.” ArXiv Preprint\nabs/2312.06902. https://arxiv.org/abs/2312.06902.\n\n\nCohen, Maxime C., Ruben Lobel, and Georgia Perakis. 2016. “The\nImpact of Demand Uncertainty on Consumer Subsidies for Green Technology\nAdoption.” Manage. Sci. 62 (5): 1235–58. https://doi.org/10.1287/mnsc.2015.2173.\n\n\nColeman, Cody, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter\nBailis, Alexander C. Berg, Robert D. Nowak, Roshan Sumbaly, Matei\nZaharia, and I. Zeki Yalniz. 2022. “Similarity Search for\nEfficient Active Learning and Search of Rare Concepts.” In\nThirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022,\nThirty-Fourth Conference on Innovative Applications of Artificial\nIntelligence, IAAI 2022, the Twelveth Symposium on Educational Advances\nin Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March\n1, 2022, 6402–10. AAAI Press. https://ojs.aaai.org/index.php/AAAI/article/view/20591.\n\n\nColeman, Cody, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao,\nJian Zhang, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia.\n2019. “Analysis of DAWNBench, a Time-to-Accuracy\nMachine Learning Performance Benchmark.” ACM SIGOPS Operating\nSystems Review 53 (1): 14–25. https://doi.org/10.1145/3352020.3352024.\n\n\nConstantinescu, Cristian. 2008. “Intermittent Faults and Effects\non Reliability of Integrated Circuits.” In 2008 Annual\nReliability and Maintainability Symposium, 370–74. IEEE; IEEE. https://doi.org/10.1109/rams.2008.4925824.\n\n\nCooper, Tom, Suzanne Fallender, Joyann Pafumi, Jon Dettling, Sebastien\nHumbert, and Lindsay Lessard. 2011. “A Semiconductor Company’s\nExamination of Its Water Footprint Approach.” In Proceedings\nof the 2011 IEEE International Symposium on Sustainable Systems and\nTechnology, 1–6. IEEE; IEEE. https://doi.org/10.1109/issst.2011.5936865.\n\n\nCope, Gord. 2009. “Pure Water, Semiconductors and the\nRecession.” Global Water Intelligence 10 (10).\n\n\nCourbariaux, Matthieu, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and\nYoshua Bengio. 2016. “Binarized Neural Networks:\nTraining Deep Neural Networks with Weights and Activations\nConstrained to+ 1 or-1.” arXiv Preprint\narXiv:1602.02830.\n\n\nD’ignazio, Catherine, and Lauren F Klein. 2023. Data Feminism.\nMIT press.\n\n\nDahl, George E, Frank Schneider, Zachary Nado, Naman Agarwal,\nChandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, et al.\n2021. “CSF Findings in Acute NMDAR and\nLGI1 AntibodyAssociated\nAutoimmune Encephalitis.” Neurology Neuroimmunology &Amp;\nNeuroinflammation 8 (6). https://doi.org/10.1212/nxi.0000000000001086.\n\n\nDarvish Rouhani, Bita, Azalia Mirhoseini, and Farinaz Koushanfar. 2017.\n“TinyDL: Just-in-time\nDeep Learning Solution for Constrained Embedded Systems.” In\n2017 IEEE International Symposium on Circuits and Systems\n(ISCAS), 1–4. IEEE. https://doi.org/10.1109/iscas.2017.8050343.\n\n\nDavarzani, Samaneh, David Saucier, Purva Talegaonkar, Erin Parker, Alana\nTurner, Carver Middleton, Will Carroll, et al. 2023. “Closing the\nWearable Gap: Footankle\nKinematic Modeling via Deep Learning Models Based on a Smart Sock\nWearable.” Wearable Technologies 4. https://doi.org/10.1017/wtc.2023.3.\n\n\nDavid, Robert, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat\nJeffries, Jian Li, Nick Kreeger, et al. 2021. “Tensorflow Lite\nMicro: Embedded Machine Learning for Tinyml\nSystems.” Proceedings of Machine Learning and Systems 3:\n800–811.\n\n\nDavies, Emma. 2011. “Endangered Elements: Critical\nThinking.” https://www.rsc.org/images/Endangered\\%20Elements\\%20-\\%20Critical\\%20Thinking\\_tcm18-196054.pdf.\n\n\nDavies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya,\nYongqiang Cao, Sri Harsha Choday, Georgios Dimou, et al. 2018.\n“Loihi: A Neuromorphic Manycore Processor with\non-Chip Learning.” IEEE Micro 38 (1): 82–99. https://doi.org/10.1109/mm.2018.112130359.\n\n\nDavies, Mike, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya,\nGabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R.\nRisbud. 2021. “Advancing Neuromorphic Computing with Loihi:\nA Survey of Results and Outlook.” Proc.\nIEEE 109 (5): 911–34. https://doi.org/10.1109/jproc.2021.3067593.\n\n\nDavis, Jacqueline, Daniel Bizo, Andy Lawrence, Owen Rogers, and Max\nSmolaks. 2022. “Uptime Institute Global Data Center Survey\n2022.” Uptime Institute.\n\n\nDayarathna, Miyuru, Yonggang Wen, and Rui Fan. 2016. “Data Center\nEnergy Consumption Modeling: A Survey.” IEEE\nCommunications Surveys &Amp; Tutorials 18 (1): 732–94. https://doi.org/10.1109/comst.2015.2481183.\n\n\nDean, Jeffrey, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc\nV. Le, Mark Z. Mao, et al. 2012. “Large Scale Distributed Deep\nNetworks.” In Advances in Neural Information Processing\nSystems 25: 26th Annual Conference on Neural Information Processing\nSystems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake\nTahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando\nC. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q.\nWeinberger, 1232–40. https://proceedings.neurips.cc/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html.\n\n\nDeng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li.\n2009. “ImageNet: A Large-Scale\nHierarchical Image Database.” In 2009 IEEE Conference on\nComputer Vision and Pattern Recognition, 248–55. IEEE. https://doi.org/10.1109/cvpr.2009.5206848.\n\n\nDesai, Tanvi, Felix Ritchie, Richard Welpton, et al. 2016. “Five\nSafes: Designing Data Access for Research.”\nEconomics Working Paper Series 1601: 28.\n\n\nDevlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.\n“BERT: Pre-training of\nDeep Bidirectional Transformers for Language Understanding.” In\nProceedings of the 2019 Conference of the North, 4171–86.\nMinneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423.\n\n\nDhar, Sauptik, Junyao Guo, Jiayi (Jason) Liu, Samarth Tripathi, Unmesh\nKurup, and Mohak Shah. 2021. “A Survey of on-Device Machine\nLearning: An Algorithms and Learning Theory Perspective.” ACM\nTransactions on Internet of Things 2 (3): 1–49. https://doi.org/10.1145/3450494.\n\n\nDong, Xin, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T.\nKung, and Ziyun Li. 2022. “SplitNets:\nDesigning Neural Architectures for Efficient Distributed\nComputing on Head-Mounted Systems.” In 2022 IEEE/CVF\nConference on Computer Vision and Pattern Recognition (CVPR),\n12549–59. IEEE. https://doi.org/10.1109/cvpr52688.2022.01223.\n\n\nDongarra, Jack J. 2009. “The Evolution of High Performance\nComputing on System z.” IBM J. Res. Dev. 53: 3–4.\n\n\nDuarte, Javier, Nhan Tran, Ben Hawks, Christian Herwig, Jules Muhizi,\nShvetank Prakash, and Vijay Janapa Reddi. 2022.\n“FastML Science Benchmarks: Accelerating\nReal-Time Scientific Edge Machine Learning.” ArXiv\nPreprint abs/2207.07958. https://arxiv.org/abs/2207.07958.\n\n\nDuchi, John C., Elad Hazan, and Yoram Singer. 2010. “Adaptive\nSubgradient Methods for Online Learning and Stochastic\nOptimization.” In COLT 2010 - the 23rd Conference on Learning\nTheory, Haifa, Israel, June 27-29, 2010, edited by Adam Tauman\nKalai and Mehryar Mohri, 257–69. Omnipress. http://colt2010.haifa.il.ibm.com/papers/COLT2010proceedings.pdf\\#page=265.\n\n\nDuisterhof, Bardienus P, Srivatsan Krishnan, Jonathan J Cruz, Colby R\nBanbury, William Fu, Aleksandra Faust, Guido CHE de Croon, and Vijay\nJanapa Reddi. 2019. “Learning to Seek: Autonomous\nSource Seeking with Deep Reinforcement Learning Onboard a Nano Drone\nMicrocontroller.” ArXiv Preprint abs/1909.11236. https://arxiv.org/abs/1909.11236.\n\n\nDuisterhof, Bardienus P., Shushuai Li, Javier Burgues, Vijay Janapa\nReddi, and Guido C. H. E. de Croon. 2021. “Sniffy Bug:\nA Fully Autonomous Swarm of Gas-Seeking Nano Quadcopters in\nCluttered Environments.” In 2021 IEEE/RSJ International\nConference on Intelligent Robots and Systems (IROS), 9099–9106.\nIEEE; IEEE. https://doi.org/10.1109/iros51168.2021.9636217.\n\n\nDwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006.\n“Calibrating Noise to Sensitivity in Private Data\nAnalysis.” In Theory of Cryptography, edited by Shai\nHalevi and Tal Rabin, 265–84. Berlin, Heidelberg: Springer Berlin\nHeidelberg.\n\n\nDwork, Cynthia, and Aaron Roth. 2013. “The Algorithmic Foundations\nof Differential Privacy.” Foundations and Trends\nin Theoretical Computer Science 9 (3-4): 211–407. https://doi.org/10.1561/0400000042.\n\n\nEbrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. “A\nReview of Data Center Cooling Technology, Operating Conditions and the\nCorresponding Low-Grade Waste Heat Recovery Opportunities.”\nRenewable Sustainable Energy Rev. 31 (March): 622–38. https://doi.org/10.1016/j.rser.2013.12.007.\n\n\nEgwutuoha, Ifeanyi P., David Levy, Bran Selic, and Shiping Chen. 2013.\n“A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart\nImplementations for High Performance Computing Systems.” The\nJournal of Supercomputing 65 (3): 1302–26. https://doi.org/10.1007/s11227-013-0884-0.\n\n\nEisenman, Assaf, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere,\nRaghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and\nMurali Annavaram. 2022. “Check-n-Run: A Checkpointing\nSystem for Training Deep Learning Recommendation Models.” In\n19th USENIX Symposium on Networked Systems Design and Implementation\n(NSDI 22), 929–43.\n\n\nEldan, Ronen, and Mark Russinovich. 2023. “Who’s Harry Potter?\nApproximate Unlearning in LLMs.” ArXiv\nPreprint abs/2310.02238. https://arxiv.org/abs/2310.02238.\n\n\nEl-Rayis, A. O. 2014. “Reconfigurable Architectures for the Next\nGeneration of Mobile Device Telecommunications Systems.” :\nhttps://www.researchgate.net/publication/292608967.\n\n\nEshraghian, Jason K., Max Ward, Emre O. Neftci, Xinxin Wang, Gregor\nLenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D. Lu.\n2023. “Training Spiking Neural Networks Using Lessons from Deep\nLearning.” Proc. IEEE 111 (9): 1016–54. https://doi.org/10.1109/jproc.2023.3308088.\n\n\nEsteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M.\nSwetter, Helen M. Blau, and Sebastian Thrun. 2017.\n“Dermatologist-Level Classification of Skin Cancer with Deep\nNeural Networks.” Nature 542 (7639): 115–18. https://doi.org/10.1038/nature21056.\n\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati,\nChaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017.\n“Robust Physical-World Attacks on Deep Learning Models.”\nArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\n\n\nFahim, Farah, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo\nJindariani, Nhan Tran, Luca P. Carloni, et al. 2021. “Hls4ml:\nAn Open-Source Codesign Workflow to Empower Scientific\nLow-Power Machine Learning Devices.” https://arxiv.org/abs/2103.05579.\n\n\nFarah, Martha J. 2005. “Neuroethics: The Practical\nand the Philosophical.” Trends Cogn. Sci. 9 (1): 34–40.\nhttps://doi.org/10.1016/j.tics.2004.12.001.\n\n\nFarwell, James P., and Rafal Rohozinski. 2011. “Stuxnet and the\nFuture of Cyber War.” Survival 53 (1): 23–40. https://doi.org/10.1080/00396338.2011.555586.\n\n\nFowers, Jeremy, Kalin Ovtcharov, Michael Papamichael, Todd Massengill,\nMing Liu, Daniel Lo, Shlomi Alkalay, et al. 2018. “A Configurable\nCloud-Scale DNN Processor for Real-Time\nAI.” In 2018 ACM/IEEE 45th Annual International\nSymposium on Computer Architecture (ISCA), 1–14. IEEE; IEEE. https://doi.org/10.1109/isca.2018.00012.\n\n\nFrancalanza, Adrian, Luca Aceto, Antonis Achilleos, Duncan Paul Attard,\nIan Cassar, Dario Della Monica, and Anna Ingólfsdóttir. 2017. “A\nFoundation for Runtime Monitoring.” In International\nConference on Runtime Verification, 8–29. Springer.\n\n\nFrankle, Jonathan, and Michael Carbin. 2019. “The Lottery Ticket\nHypothesis: Finding Sparse, Trainable Neural\nNetworks.” In 7th International Conference on Learning\nRepresentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.\nOpenReview.net. https://openreview.net/forum?id=rJl-b3RcF7.\n\n\nFriedman, Batya. 1996. “Value-Sensitive Design.”\nInteractions 3 (6): 16–23. https://doi.org/10.1145/242485.242493.\n\n\nFurber, Steve. 2016. “Large-Scale Neuromorphic Computing\nSystems.” J. Neural Eng. 13 (5): 051001. https://doi.org/10.1088/1741-2560/13/5/051001.\n\n\nFursov, Ivan, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun,\nRodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey\nZaytsev, and Evgeny Burnaev. 2021. “Adversarial Attacks on Deep\nModels for Financial Transaction Records.” In Proceedings of\nthe 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data\nMining, 2868–78. ACM. https://doi.org/10.1145/3447548.3467145.\n\n\nGale, Trevor, Erich Elsen, and Sara Hooker. 2019. “The State of\nSparsity in Deep Neural Networks.” ArXiv Preprint\nabs/1902.09574. https://arxiv.org/abs/1902.09574.\n\n\nGandolfi, Karine, Christophe Mourtel, and Francis Olivier. 2001.\n“Electromagnetic Analysis: Concrete Results.”\nIn Cryptographic Hardware and Embedded SystemsCHES\n2001: Third International Workshop Paris, France, May 1416,\n2001 Proceedings 3, 251–61. Springer.\n\n\nGannot, G., and M. Ligthart. 1994. “Verilog HDL Based\nFPGA Design.” In International Verilog HDL\nConference, 86–92. IEEE. https://doi.org/10.1109/ivc.1994.323743.\n\n\nGao, Yansong, Said F. Al-Sarawi, and Derek Abbott. 2020. “Physical\nUnclonable Functions.” Nature Electronics 3 (2): 81–91.\nhttps://doi.org/10.1038/s41928-020-0372-5.\n\n\nGates, Byron D. 2009. “Flexible Electronics.”\nScience 323 (5921): 1566–67. https://doi.org/10.1126/science.1171230.\n\n\nGebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman\nVaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021.\n“Datasheets for Datasets.” Commun. ACM 64 (12):\n86–92. https://doi.org/10.1145/3458723.\n\n\nGeiger, Atticus, Hanson Lu, Thomas Icard, and Christopher Potts. 2021.\n“Causal Abstractions of Neural Networks.” In Advances\nin Neural Information Processing Systems 34: Annual Conference on Neural\nInformation Processing Systems 2021, NeurIPS 2021, December 6-14, 2021,\nVirtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N.\nDauphin, Percy Liang, and Jennifer Wortman Vaughan, 9574–86. https://proceedings.neurips.cc/paper/2021/hash/4f5c422f4d49a5a807eda27434231040-Abstract.html.\n\n\nGholami, Dong Kim, Mahoney Yao, and Keutzer. 2021. “A Survey of\nQuantization Methods for Efficient Neural Network Inference).”\nArXiv Preprint. https://arxiv.org/abs/2103.13630.\n\n\nGlorot, Xavier, and Yoshua Bengio. 2010. “Understanding the\nDifficulty of Training Deep Feedforward Neural Networks.” In\nProceedings of the Thirteenth International Conference on Artificial\nIntelligence and Statistics. https://proceedings.mlr.press/v9/glorot10a.html.\n\n\nGnad, Dennis R. E., Fabian Oboril, and Mehdi B. Tahoori. 2017.\n“Voltage Drop-Based Fault Attacks on FPGAs Using\nValid Bitstreams.” In 2017 27th International Conference on\nField Programmable Logic and Applications (FPL), 1–7. IEEE; IEEE.\nhttps://doi.org/10.23919/fpl.2017.8056840.\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David\nWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020.\n“Generative Adversarial Networks.” Commun. ACM 63\n(11): 139–44. https://doi.org/10.1145/3422622.\n\n\nGoodyear, Victoria A. 2017. “Social Media, Apps and Wearable\nTechnologies: Navigating Ethical Dilemmas and\nProcedures.” Qualitative Research in Sport, Exercise and\nHealth 9 (3): 285–302. https://doi.org/10.1080/2159676x.2017.1303790.\n\n\nGoogle. n.d. “Information Quality Content Moderation.” https://blog.google/documents/83/.\n\n\nGordon, Ariel, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang,\nand Edward Choi. 2018. “MorphNet: Fast\n&Amp; Simple Resource-Constrained Structure Learning of Deep\nNetworks.” In 2018 IEEE/CVF Conference on Computer Vision and\nPattern Recognition, 1586–95. IEEE. https://doi.org/10.1109/cvpr.2018.00171.\n\n\nGräfe, Ralf, Qutub Syed Sha, Florian Geissler, and Michael Paulitsch.\n2023. “Large-Scale Application of Fault Injection into\nPyTorch Models -an Extension to PyTorchFI for\nValidation Efficiency.” In 2023 53rd Annual IEEE/IFIP\nInternational Conference on Dependable Systems and Networks -\nSupplemental Volume (DSN-s), 56–62. IEEE; IEEE. https://doi.org/10.1109/dsn-s58398.2023.00025.\n\n\nGreengard, Samuel. 2015. The Internet of Things. The MIT Press.\nhttps://doi.org/10.7551/mitpress/10277.001.0001.\n\n\nGrossman, Elizabeth. 2007. High Tech Trash: Digital\nDevices, Hidden Toxics, and Human Health. Island press.\n\n\nGruslys, Audrunas, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex\nGraves. 2016. “Memory-Efficient Backpropagation Through\nTime.” In Advances in Neural Information Processing Systems\n29: Annual Conference on Neural Information Processing Systems 2016,\nDecember 5-10, 2016, Barcelona, Spain, edited by Daniel D. Lee,\nMasashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett,\n4125–33. https://proceedings.neurips.cc/paper/2016/hash/a501bebf79d570651ff601788ea9d16d-Abstract.html.\n\n\nGu, Ivy. 2023. “Deep Learning Model Compression (Ii) by Ivy Gu\nMedium.” https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453.\n\n\nGuo, Chuan, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian\nWeinberger. 2019. “Simple Black-Box Adversarial Attacks.”\nIn International Conference on Machine Learning, 2484–93. PMLR.\n\n\nGuo, Yutao, Hao Wang, Hui Zhang, Tong Liu, Zhaoguang Liang, Yunlong Xia,\nLi Yan, et al. 2019. “Mobile Photoplethysmographic Technology to\nDetect Atrial Fibrillation.” J. Am. Coll. Cardiol. 74\n(19): 2365–75. https://doi.org/10.1016/j.jacc.2019.08.019.\n\n\nGupta, Maanak, Charankumar Akiri, Kshitiz Aryal, Eli Parker, and\nLopamudra Praharaj. 2023. “From ChatGPT to\nThreatGPT: Impact of Generative\nAI in Cybersecurity and Privacy.”\n#IEEE_O_ACC# 11: 80218–45. https://doi.org/10.1109/access.2023.3300381.\n\n\nGupta, Maya, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin\nCanini, Alexander Mangylov, Wojciech Moczydlowski, and Alexander Van\nEsbroeck. 2016. “Monotonic Calibrated Interpolated Look-up\nTables.” The Journal of Machine Learning Research 17\n(1): 3790–3836.\n\n\nGupta, Udit, Mariam Elgamal, Gage Hills, Gu-Yeon Wei, Hsien-Hsin S. Lee,\nDavid Brooks, and Carole-Jean Wu. 2022. “Act: Designing\nSustainable Computer Systems with an Architectural Carbon Modeling\nTool.” In Proceedings of the 49th Annual International\nSymposium on Computer Architecture, 784–99. ACM. https://doi.org/10.1145/3470496.3527408.\n\n\nGwennap, Linley. n.d. “Certus-NX Innovates\nGeneral-Purpose FPGAs.”\n\n\nHaensch, Wilfried, Tayfun Gokmen, and Ruchir Puri. 2019. “The Next\nGeneration of Deep Learning Hardware: Analog\nComputing.” Proc. IEEE 107 (1): 108–22. https://doi.org/10.1109/jproc.2018.2871057.\n\n\nHamming, R. W. 1950. “Error Detecting and Error Correcting\nCodes.” Bell Syst. Tech. J. 29 (2): 147–60. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.\n\n\nHan, Song, Huizi Mao, and William J Dally. 2015. “Deep\nCompression: Compressing Deep Neural Networks with Pruning, Trained\nQuantization and Huffman Coding.” arXiv Preprint\narXiv:1510.00149.\n\n\nHan, Song, Huizi Mao, and William J. Dally. 2016. “Deep\nCompression: Compressing Deep Neural Networks with Pruning,\nTrained Quantization and Huffman Coding.” https://arxiv.org/abs/1510.00149.\n\n\nHandlin, Oscar. 1965. “Science and Technology in Popular\nCulture.” Daedalus-Us., 156–70.\n\n\nHardt, Moritz, Eric Price, and Nati Srebro. 2016. “Equality of\nOpportunity in Supervised Learning.” In Advances in Neural\nInformation Processing Systems 29: Annual Conference on Neural\nInformation Processing Systems 2016, December 5-10, 2016, Barcelona,\nSpain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von\nLuxburg, Isabelle Guyon, and Roman Garnett, 3315–23. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html.\n\n\nHawks, Benjamin, Javier Duarte, Nicholas J. Fraser, Alessandro\nPappalardo, Nhan Tran, and Yaman Umuroglu. 2021. “Ps and Qs: Quantization-aware Pruning for Efficient Low\nLatency Neural Network Inference.” Frontiers in Artificial\nIntelligence 4 (July). https://doi.org/10.3389/frai.2021.676564.\n\n\nHazan, Avi, and Elishai Ezra Tsur. 2021. “Neuromorphic Analog\nImplementation of Neural Engineering Framework-Inspired Spiking Neuron\nfor High-Dimensional Representation.” Front. Neurosci.\n15 (February): 627221. https://doi.org/10.3389/fnins.2021.627221.\n\n\nHe, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015.\n“Delving Deep into Rectifiers: Surpassing Human-Level\nPerformance on ImageNet Classification.” In 2015\nIEEE International Conference on Computer Vision (ICCV), 1026–34.\nIEEE. https://doi.org/10.1109/iccv.2015.123.\n\n\n———. 2016. “Deep Residual Learning for Image Recognition.”\nIn 2016 IEEE Conference on Computer Vision and Pattern Recognition\n(CVPR), 770–78. IEEE. https://doi.org/10.1109/cvpr.2016.90.\n\n\nHe, Yi, Prasanna Balaprakash, and Yanjing Li. 2020.\n“FIdelity: Efficient Resilience Analysis\nFramework for Deep Learning Accelerators.” In 2020 53rd\nAnnual IEEE/ACM International Symposium on Microarchitecture\n(MICRO), 270–81. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00033.\n\n\nHe, Yi, Mike Hutton, Steven Chan, Robert De Gruijl, Rama Govindaraju,\nNishant Patil, and Yanjing Li. 2023. “Understanding and Mitigating\nHardware Failures in Deep Learning Training Systems.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture, 1–16. IEEE; ACM. https://doi.org/10.1145/3579371.3589105.\n\n\nHébert-Johnson, Úrsula, Michael P. Kim, Omer Reingold, and Guy N.\nRothblum. 2018. “Multicalibration: Calibration for\nthe (Computationally-Identifiable) Masses.” In\nProceedings of the 35th International Conference on Machine\nLearning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15,\n2018, edited by Jennifer G. Dy and Andreas Krause, 80:1944–53.\nProceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v80/hebert-johnson18a.html.\n\n\nHegde, Sumant. 2023. “An Introduction to Separable Convolutions -\nAnalytics Vidhya.” https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/.\n\n\nHenderson, Peter, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky,\nand Joelle Pineau. 2020. “Towards the Systematic Reporting of the\nEnergy and Carbon Footprints of Machine Learning.” The\nJournal of Machine Learning Research 21 (1): 10039–81.\n\n\nHendrycks, Dan, and Thomas Dietterich. 2019. “Benchmarking Neural\nNetwork Robustness to Common Corruptions and Perturbations.”\narXiv Preprint arXiv:1903.12261.\n\n\nHendrycks, Dan, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn\nSong. 2021. “Natural Adversarial Examples.” In 2021\nIEEE/CVF Conference on Computer Vision and Pattern Recognition\n(CVPR), 15262–71. IEEE. https://doi.org/10.1109/cvpr46437.2021.01501.\n\n\nHennessy, John L., and David A. Patterson. 2019. “A New Golden Age\nfor Computer Architecture.” Commun. ACM 62 (2): 48–60.\nhttps://doi.org/10.1145/3282307.\n\n\nHimmelstein, Gracie, David Bates, and Li Zhou. 2022. “Examination\nof Stigmatizing Language in the Electronic Health Record.”\nJAMA Network Open 5 (1): e2144967. https://doi.org/10.1001/jamanetworkopen.2021.44967.\n\n\nHinton, Geoffrey. 2005. “Van Nostrand’s Scientific Encyclopedia.” Wiley.\nhttps://doi.org/10.1002/0471743984.vse0673.\n\n\n———. 2017. “Overview of Minibatch Gradient Descent.”\nUniversity of Toronto; University Lecture.\n\n\nHo Yoon, Jung, Hyung-Suk Jung, Min Hwan Lee, Gun Hwan Kim, Seul Ji Song,\nJun Yeong Seok, Kyung Jean Yoon, et al. 2012. “Frontiers in\nElectronic Materials.” Wiley. https://doi.org/10.1002/9783527667703.ch67.\n\n\nHoefler, Torsten, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and\nAlexandra Peste. 2021. “Sparsity in Deep Learning:\nPruning and Growth for Efficient Inference and Training in\nNeural Networks.” https://arxiv.org/abs/2102.00554.\n\n\nHolland, Sarah, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia\nChmielinski. 2020. “The Dataset Nutrition Label: A Framework to\nDrive Higher Data Quality Standards.” In Data Protection and\nPrivacy. Hart Publishing. https://doi.org/10.5040/9781509932771.ch-001.\n\n\nHong, Sanghyun, Nicholas Carlini, and Alexey Kurakin. 2023.\n“Publishing Efficient on-Device Models Increases Adversarial\nVulnerability.” In 2023 IEEE Conference on Secure and\nTrustworthy Machine Learning (SaTML), 271–90. IEEE; IEEE. https://doi.org/10.1109/satml54575.2023.00026.\n\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran.\n2017. “Deceiving Google’s Perspective Api Built for Detecting\nToxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\n\n\nHoward, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun\nWang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017.\n“MobileNets: Efficient Convolutional\nNeural Networks for Mobile Vision Applications.” ArXiv\nPreprint. https://arxiv.org/abs/1704.04861.\n\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman\nMahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay\nJanapa Reddi. 2023. “MAVFI: An\nEnd-to-End Fault Analysis Framework with Anomaly Detection and Recovery\nfor Micro Aerial Vehicles.” In 2023 Design, Automation\n&Amp; Test in Europe Conference &Amp; Exhibition (DATE),\n1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\n\n\nHsu, Liang-Ching, Ching-Yi Huang, Yen-Hsun Chuang, Ho-Wen Chen, Ya-Ting\nChan, Heng Yi Teah, Tsan-Yao Chen, Chiung-Fen Chang, Yu-Ting Liu, and\nYu-Min Tzou. 2016. “Accumulation of Heavy Metals and Trace\nElements in Fluvial Sediments Received Effluents from Traditional and\nSemiconductor Industries.” Scientific Reports 6 (1):\n34250. https://doi.org/10.1038/srep34250.\n\n\nHu, Jie, Li Shen, and Gang Sun. 2018. “Squeeze-and-Excitation\nNetworks.” In 2018 IEEE/CVF Conference on Computer Vision and\nPattern Recognition, 7132–41. IEEE. https://doi.org/10.1109/cvpr.2018.00745.\n\n\nHu, Yang, Jie Jiang, Lifu Zhang, Yunfeng Shi, and Jian Shi. 2023.\n“Halide Perovskite Semiconductors.” Wiley. https://doi.org/10.1002/9783527829026.ch13.\n\n\nHuang, Tsung-Ching, Kenjiro Fukuda, Chun-Ming Lo, Yung-Hui Yeh, Tsuyoshi\nSekitani, Takao Someya, and Kwang-Ting Cheng. 2011.\n“Pseudo-CMOS: A Design Style for\nLow-Cost and Robust Flexible Electronics.” IEEE Trans.\nElectron Devices 58 (1): 141–50. https://doi.org/10.1109/ted.2010.2088127.\n\n\nHutter, Michael, Jorn-Marc Schmidt, and Thomas Plos. 2009.\n“Contact-Based Fault Injections and Power Analysis on\nRFID Tags.” In 2009 European Conference on\nCircuit Theory and Design, 409–12. IEEE; IEEE. https://doi.org/10.1109/ecctd.2009.5275012.\n\n\nIandola, Forrest N, Song Han, Matthew W Moskewicz, Khalid Ashraf,\nWilliam J Dally, and Kurt Keutzer. 2016. “SqueezeNet:\nAlexnet-level Accuracy with 50x Fewer\nParameters and 0.5 MB Model Size.” ArXiv\nPreprint abs/1602.07360. https://arxiv.org/abs/1602.07360.\n\n\nIgnatov, Andrey, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim\nHartley, and Luc Van Gool. 2018. “AI Benchmark:\nRunning Deep Neural Networks on Android\nSmartphones,” 0–0.\n\n\nImani, Mohsen, Abbas Rahimi, and Tajana S. Rosing. 2016.\n“Resistive Configurable Associative Memory for Approximate\nComputing.” In Proceedings of the 2016 Design, Automation\n&Amp; Test in Europe Conference &Amp; Exhibition (DATE),\n1327–32. IEEE; Research Publishing Services. https://doi.org/10.3850/9783981537079_0454.\n\n\nIntelLabs. 2023. “Knowledge Distillation - Neural Network\nDistiller.” https://intellabs.github.io/distiller/knowledge_distillation.html.\n\n\nIppolito, Daphne, Florian Tramer, Milad Nasr, Chiyuan Zhang, Matthew\nJagielski, Katherine Lee, Christopher Choquette Choo, and Nicholas\nCarlini. 2023. “Preventing Generation of Verbatim Memorization in\nLanguage Models Gives a False Sense of Privacy.” In\nProceedings of the 16th International Natural Language Generation\nConference, 5253–70. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.inlg-main.3.\n\n\nIrimia-Vladu, Mihai. 2014.\n““Green” Electronics:\nBiodegradable and Biocompatible Materials and Devices for\nSustainable Future.” Chem. Soc. Rev. 43 (2): 588–610. https://doi.org/10.1039/c3cs60235d.\n\n\nIsscc. 2014. “Computing’s Energy Problem (and What We Can Do about\nIt).” https://ieeexplore.ieee.org/document/6757323.\n\n\nJacob, Benoit, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang,\nAndrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018.\n“Quantization and Training of Neural Networks for Efficient\nInteger-Arithmetic-Only Inference.” In 2018 IEEE/CVF\nConference on Computer Vision and Pattern Recognition, 2704–13.\nIEEE. https://doi.org/10.1109/cvpr.2018.00286.\n\n\nJanapa Reddi, Vijay, Alexander Elium, Shawn Hymel, David Tischler,\nDaniel Situnayake, Carl Ward, Louis Moreau, et al. 2023. “Edge\nImpulse: An MLOps Platform for Tiny Machine\nLearning.” Proceedings of Machine Learning and Systems\n5.\n\n\nJha, A. R. 2014. Rare Earth Materials: Properties and\nApplications. CRC Press. https://doi.org/10.1201/b17045.\n\n\nJha, Saurabh, Subho Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B.\nSullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, and Ravishankar K.\nIyer. 2019. “ML-Based Fault Injection for Autonomous\nVehicles: A Case for Bayesian Fault\nInjection.” In 2019 49th Annual IEEE/IFIP International\nConference on Dependable Systems and Networks (DSN), 112–24. IEEE;\nIEEE. https://doi.org/10.1109/dsn.2019.00025.\n\n\nJia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan\nLong, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014.\n“Caffe: Convolutional Architecture for Fast Feature\nEmbedding.” In Proceedings of the 22nd ACM International\nConference on Multimedia, 675–78. ACM. https://doi.org/10.1145/2647868.2654889.\n\n\nJia, Zhe, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza.\n2018. “Dissecting the NVIDIA Volta\nGPU Architecture via Microbenchmarking.” ArXiv\nPreprint. https://arxiv.org/abs/1804.06826.\n\n\nJia, Zhenge, Dawei Li, Xiaowei Xu, Na Li, Feng Hong, Lichuan Ping, and\nYiyu Shi. 2023. “Life-Threatening Ventricular Arrhythmia Detection\nChallenge in Implantable\nCardioverterdefibrillators.” Nature Machine\nIntelligence 5 (5): 554–55. https://doi.org/10.1038/s42256-023-00659-9.\n\n\nJia, Zhihao, Matei Zaharia, and Alex Aiken. 2019. “Beyond Data and\nModel Parallelism for Deep Neural Networks.” In Proceedings\nof Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA,\nMarch 31 - April 2, 2019, edited by Ameet Talwalkar, Virginia\nSmith, and Matei Zaharia. mlsys.org. https://proceedings.mlsys.org/book/265.pdf.\n\n\nJin, Yilun, Xiguang Wei, Yang Liu, and Qiang Yang. 2020. “Towards\nUtilizing Unlabeled Data in Federated Learning: A Survey\nand Prospective.” arXiv Preprint arXiv:2002.11545.\n\n\nJohnson-Roberson, Matthew, Charles Barto, Rounak Mehta, Sharath Nittur\nSridhar, Karl Rosaen, and Ram Vasudevan. 2017. “Driving in the\nMatrix: Can Virtual Worlds Replace Human-Generated\nAnnotations for Real World Tasks?” In 2017 IEEE International\nConference on Robotics and Automation (ICRA), 746–53. Singapore,\nSingapore: IEEE. https://doi.org/10.1109/icra.2017.7989092.\n\n\nJouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav\nAgrawal, Raminder Bajwa, Sarah Bates, et al. 2017a. “In-Datacenter\nPerformance Analysis of a Tensor Processing Unit.” In\nProceedings of the 44th Annual International Symposium on Computer\nArchitecture, 1–12. ISCA ’17. New York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\n\n———, et al. 2017b. “In-Datacenter Performance Analysis of a Tensor\nProcessing Unit.” In Proceedings of the 44th Annual\nInternational Symposium on Computer Architecture, 1–12. ISCA ’17.\nNew York, NY, USA: ACM. https://doi.org/10.1145/3079856.3080246.\n\n\nJouppi, Norm, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng\nNai, Nishant Patil, et al. 2023. “TPU V4:\nAn Optically Reconfigurable Supercomputer for Machine\nLearning with Hardware Support for Embeddings.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture. ISCA ’23. New York, NY, USA: ACM. https://doi.org/10.1145/3579371.3589350.\n\n\nJoye, Marc, and Michael Tunstall. 2012. Fault Analysis in\nCryptography. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-29656-7.\n\n\nKairouz, Peter, Sewoong Oh, and Pramod Viswanath. 2015. “Secure\nMulti-Party Differential Privacy.” In Advances in Neural\nInformation Processing Systems 28: Annual Conference on Neural\nInformation Processing Systems 2015, December 7-12, 2015, Montreal,\nQuebec, Canada, edited by Corinna Cortes, Neil D. Lawrence, Daniel\nD. Lee, Masashi Sugiyama, and Roman Garnett, 2008–16. https://proceedings.neurips.cc/paper/2015/hash/a01610228fe998f515a72dd730294d87-Abstract.html.\n\n\nKalamkar, Dhiraj, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das,\nKunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, et al. 2019.\n“A Study of BFLOAT16 for Deep Learning\nTraining.” https://arxiv.org/abs/1905.12322.\n\n\nKao, Sheng-Chun, Geonhwa Jeong, and Tushar Krishna. 2020.\n“ConfuciuX: Autonomous Hardware Resource\nAssignment for DNN Accelerators Using Reinforcement\nLearning.” In 2020 53rd Annual IEEE/ACM International\nSymposium on Microarchitecture (MICRO), 622–36. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00058.\n\n\nKao, Sheng-Chun, and Tushar Krishna. 2020. “Gamma: Automating the\nHW Mapping of DNN Models on Accelerators via Genetic Algorithm.”\nIn Proceedings of the 39th International Conference on\nComputer-Aided Design, 1–9. ACM. https://doi.org/10.1145/3400302.3415639.\n\n\nKaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin\nChess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario\nAmodei. 2020. “Scaling Laws for Neural Language Models.”\nArXiv Preprint abs/2001.08361. https://arxiv.org/abs/2001.08361.\n\n\nKarargyris, Alexandros, Renato Umeton, Micah J Sheller, Alejandro\nAristizabal, Johnu George, Anna Wuest, Sarthak Pati, et al. 2023.\n“Federated Benchmarking of Medical Artificial Intelligence with\nMedPerf.” Nature Machine Intelligence 5\n(7): 799–810. https://doi.org/10.1038/s42256-023-00652-2.\n\n\nKaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna\nWallach, and Jennifer Wortman Vaughan. 2020. “Interpreting\nInterpretability: Understanding Data Scientists’ Use of\nInterpretability Tools for Machine Learning.” In Proceedings\nof the 2020 CHI Conference on Human Factors in Computing Systems,\nedited by Regina Bernhaupt, Florian ’Floyd’Mueller, David Verweij, Josh\nAndres, Joanna McGrenere, Andy Cockburn, Ignacio Avellino, et al., 1–14.\nACM. https://doi.org/10.1145/3313831.3376219.\n\n\nKawazoe Aguilera, Marcos, Wei Chen, and Sam Toueg. 1997.\n“Heartbeat: A Timeout-Free Failure Detector for\nQuiescent Reliable Communication.” In Distributed Algorithms:\n11th International Workshop, WDAG’97 Saarbrücken, Germany, September\n2426, 1997 Proceedings 11, 126–40. Springer.\n\n\nKhan, Mohammad Emtiyaz, and Siddharth Swaroop. 2021.\n“Knowledge-Adaptation Priors.” In Advances in Neural\nInformation Processing Systems 34: Annual Conference on Neural\nInformation Processing Systems 2021, NeurIPS 2021, December 6-14, 2021,\nVirtual, edited by Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N.\nDauphin, Percy Liang, and Jennifer Wortman Vaughan, 19757–70. https://proceedings.neurips.cc/paper/2021/hash/a4380923dd651c195b1631af7c829187-Abstract.html.\n\n\nKiela, Douwe, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger,\nZhengxuan Wu, Bertie Vidgen, et al. 2021. “Dynabench:\nRethinking Benchmarking in NLP.” In\nProceedings of the 2021 Conference of the North American Chapter of\nthe Association for Computational Linguistics: Human Language\nTechnologies, 4110–24. Online: Association for Computational\nLinguistics. https://doi.org/10.18653/v1/2021.naacl-main.324.\n\n\nKim, Jungrae, Michael Sullivan, and Mattan Erez. 2015. “Bamboo\nECC: Strong, Safe, and Flexible Codes for\nReliable Computer Memory.” In 2015 IEEE 21st International\nSymposium on High Performance Computer Architecture (HPCA), 101–12.\nIEEE; IEEE. https://doi.org/10.1109/hpca.2015.7056025.\n\n\nKim, Sunju, Chungsik Yoon, Seunghon Ham, Jihoon Park, Ohun Kwon, Donguk\nPark, Sangjun Choi, Seungwon Kim, Kwonchul Ha, and Won Kim. 2018.\n“Chemical Use in the Semiconductor Manufacturing Industry.”\nInt. J. Occup. Env. Heal. 24 (3-4): 109–18. https://doi.org/10.1080/10773525.2018.1519957.\n\n\nKingma, Diederik P., and Jimmy Ba. 2015. “Adam: A\nMethod for Stochastic Optimization.” In 3rd International\nConference on Learning Representations, ICLR 2015, San Diego, CA, USA,\nMay 7-9, 2015, Conference Track Proceedings, edited by Yoshua\nBengio and Yann LeCun. http://arxiv.org/abs/1412.6980.\n\n\nKirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness,\nGuillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al. 2017.\n“Overcoming Catastrophic Forgetting in Neural Networks.”\nProc. Natl. Acad. Sci. 114 (13): 3521–26. https://doi.org/10.1073/pnas.1611835114.\n\n\nKo, Yohan. 2021. “Characterizing System-Level Masking Effects\nAgainst Soft Errors.” Electronics 10 (18): 2286. https://doi.org/10.3390/electronics10182286.\n\n\nKocher, Paul, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss,\nWerner Haas, Mike Hamburg, et al. 2019a. “Spectre Attacks:\nExploiting Speculative Execution.” In 2019 IEEE\nSymposium on Security and Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\n\n———, et al. 2019b. “Spectre Attacks: Exploiting\nSpeculative Execution.” In 2019 IEEE Symposium on Security\nand Privacy (SP). IEEE. https://doi.org/10.1109/sp.2019.00002.\n\n\nKocher, Paul, Joshua Jaffe, and Benjamin Jun. 1999. “Differential\nPower Analysis.” In Advances in\nCryptologyCRYPTO’99: 19th Annual International Cryptology\nConference Santa Barbara, California, USA, August 1519,\n1999 Proceedings 19, 388–97. Springer.\n\n\nKocher, Paul, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. 2011.\n“Introduction to Differential Power Analysis.” Journal\nof Cryptographic Engineering 1 (1): 5–27. https://doi.org/10.1007/s13389-011-0006-y.\n\n\nKoh, Pang Wei, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma\nPierson, Been Kim, and Percy Liang. 2020. “Concept Bottleneck\nModels.” In Proceedings of the 37th International Conference\non Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event,\n119:5338–48. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v119/koh20a.html.\n\n\nKoh, Pang Wei, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin\nZhang, Akshay Balsubramani, Weihua Hu, et al. 2021.\n“WILDS: A Benchmark of in-the-Wild\nDistribution Shifts.” In Proceedings of the 38th\nInternational Conference on Machine Learning, ICML 2021, 18-24 July\n2021, Virtual Event, edited by Marina Meila and Tong Zhang,\n139:5637–64. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/koh21a.html.\n\n\nKoren, Yehuda, Robert Bell, and Chris Volinsky. 2009. “Matrix\nFactorization Techniques for Recommender Systems.”\nComputer 42 (8): 30–37. https://doi.org/10.1109/mc.2009.263.\n\n\nKrishna, Adithya, Srikanth Rohit Nudurupati, Chandana D G, Pritesh\nDwivedi, André van Schaik, Mahesh Mehendale, and Chetan Singh Thakur.\n2023. “RAMAN: A Re-Configurable and\nSparse TinyML Accelerator for Inference on Edge.” https://arxiv.org/abs/2306.06493.\n\n\nKrishnamoorthi. 2018. “Quantizing Deep Convolutional Networks for\nEfficient Inference: A Whitepaper.” ArXiv\nPreprint. https://arxiv.org/abs/1806.08342.\n\n\nKrishnan, Rayan, Pranav Rajpurkar, and Eric J. Topol. 2022.\n“Self-Supervised Learning in Medicine and Healthcare.”\nNat. Biomed. Eng. 6 (12): 1346–52. https://doi.org/10.1038/s41551-022-00914-1.\n\n\nKrishnan, Srivatsan, Natasha Jaques, Shayegan Omidshafiei, Dan Zhang,\nIzzeddin Gur, Vijay Janapa Reddi, and Aleksandra Faust. 2022.\n“Multi-Agent Reinforcement Learning for Microprocessor Design\nSpace Exploration.” https://arxiv.org/abs/2211.16385.\n\n\nKrishnan, Srivatsan, Amir Yazdanbakhsh, Shvetank Prakash, Jason Jabbour,\nIkechukwu Uchendu, Susobhan Ghosh, Behzad Boroujerdian, et al. 2023.\n“ArchGym: An Open-Source Gymnasium for\nMachine Learning Assisted Architecture Design.” In\nProceedings of the 50th Annual International Symposium on Computer\nArchitecture, 1–16. ACM. https://doi.org/10.1145/3579371.3589049.\n\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012a.\n“ImageNet Classification with Deep Convolutional Neural\nNetworks.” Edited by F. Pereira, C. J. Burges, L. Bottou, and K.\nQ. Weinberger 25. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.\n\n\nKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012b.\n“ImageNet Classification with Deep Convolutional\nNeural Networks.” In Advances in Neural Information\nProcessing Systems 25: 26th Annual Conference on Neural Information\nProcessing Systems 2012. Proceedings of a Meeting Held December 3-6,\n2012, Lake Tahoe, Nevada, United States, edited by Peter L.\nBartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou,\nand Kilian Q. Weinberger, 1106–14. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.\n\n\nKung, Hsiang Tsung, and Charles E Leiserson. 1979. “Systolic\nArrays (for VLSI).” In Sparse Matrix Proceedings\n1978, 1:256–82. Society for industrial; applied mathematics\nPhiladelphia, PA, USA.\n\n\nKurth, Thorsten, Shashank Subramanian, Peter Harrington, Jaideep Pathak,\nMorteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, and Anima\nAnandkumar. 2023. “FourCastNet:\nAccelerating Global High-Resolution Weather Forecasting\nUsing Adaptive Fourier Neural Operators.” In\nProceedings of the Platform for Advanced Scientific Computing\nConference, 1–11. ACM. https://doi.org/10.1145/3592979.3593412.\n\n\nKuzmin, Andrey, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters,\nand Tijmen Blankevoort. 2022. “FP8 Quantization:\nThe Power of the Exponent.” https://arxiv.org/abs/2208.09225.\n\n\nKwon, Jisu, and Daejin Park. 2021. “Hardware/Software\nCo-Design for TinyML Voice-Recognition Application on\nResource Frugal Edge Devices.” Applied Sciences 11 (22):\n11073. https://doi.org/10.3390/app112211073.\n\n\nKwon, Sun Hwa, and Lin Dong. 2022. “Flexible Sensors and Machine\nLearning for Heart Monitoring.” Nano Energy 102\n(November): 107632. https://doi.org/10.1016/j.nanoen.2022.107632.\n\n\nKwon, Young D, Rui Li, Stylianos I Venieris, Jagmohan Chauhan, Nicholas\nD Lane, and Cecilia Mascolo. 2023. “TinyTrain:\nDeep Neural Network Training at the Extreme Edge.”\nArXiv Preprint abs/2307.09988. https://arxiv.org/abs/2307.09988.\n\n\nLai, Liangzhen, Naveen Suda, and Vikas Chandra. 2018a. “Cmsis-Nn:\nEfficient Neural Network Kernels for Arm Cortex-m\nCpus.” ArXiv Preprint abs/1801.06601. https://arxiv.org/abs/1801.06601.\n\n\n———. 2018b. “CMSIS-NN:\nEfficient Neural Network Kernels for Arm Cortex-m\nCPUs.” https://arxiv.org/abs/1801.06601.\n\n\nLakkaraju, Himabindu, and Osbert Bastani. 2020. “”How\nDo i Fool You?”: Manipulating User Trust via Misleading Black Box\nExplanations.” In Proceedings of the AAAI/ACM Conference on\nAI, Ethics, and Society, 79–85. ACM. https://doi.org/10.1145/3375627.3375833.\n\n\nLam, Remi, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger,\nMeire Fortunato, Ferran Alet, Suman Ravuri, et al. 2023. “Learning\nSkillful Medium-Range Global Weather Forecasting.”\nScience 382 (6677): 1416–21. https://doi.org/10.1126/science.adi2336.\n\n\nLannelongue, Loı̈c, Jason Grealey, and Michael Inouye. 2021. “Green\nAlgorithms: Quantifying the Carbon Footprint of\nComputation.” Adv. Sci. 8 (12): 2100707. https://doi.org/10.1002/advs.202100707.\n\n\nLeCun, Yann, John Denker, and Sara Solla. 1989. “Optimal Brain\nDamage.” Adv Neural Inf Process Syst 2.\n\n\nLee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang,\nand Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed\nSignal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear\nExplosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970.\n\n\nLeRoy Poff, N, MM Brinson, and JW Day. 2002. “Aquatic Ecosystems\n& Global Climate Change.” Pew Center on Global Climate\nChange.\n\n\nLi, En, Liekang Zeng, Zhi Zhou, and Xu Chen. 2020. “Edge\nAI: On-demand Accelerating Deep\nNeural Network Inference via Edge Computing.” IEEE Trans.\nWireless Commun. 19 (1): 447–57. https://doi.org/10.1109/twc.2019.2946140.\n\n\nLi, Guanpeng, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai,\nKarthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017.\n“Understanding Error Propagation in Deep Learning Neural Network\n(DNN) Accelerators and Applications.” In\nProceedings of the International Conference for High Performance\nComputing, Networking, Storage and Analysis, 1–12. ACM. https://doi.org/10.1145/3126908.3126964.\n\n\nLi, Jingzhen, Igbe Tobore, Yuhang Liu, Abhishek Kandwal, Lei Wang, and\nZedong Nie. 2021. “Non-Invasive Monitoring of Three Glucose Ranges\nBased on ECG by Using\nDBSCAN-CNN.” #IEEE_J_BHI# 25\n(9): 3340–50. https://doi.org/10.1109/jbhi.2021.3072628.\n\n\nLi, Mu, David G. Andersen, Alexander J. Smola, and Kai Yu. 2014.\n“Communication Efficient Distributed Machine Learning with the\nParameter Server.” In Advances in Neural Information\nProcessing Systems 27: Annual Conference on Neural Information\nProcessing Systems 2014, December 8-13 2014, Montreal, Quebec,\nCanada, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes,\nNeil D. Lawrence, and Kilian Q. Weinberger, 19–27. https://proceedings.neurips.cc/paper/2014/hash/1ff1de774005f8da13f42943881c655f-Abstract.html.\n\n\nLi, Qinbin, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu,\nand Bingsheng He. 2023. “A Survey on Federated Learning Systems:\nVision, Hype and Reality for Data Privacy and\nProtection.” IEEE Trans. Knowl. Data Eng. 35 (4):\n3347–66. https://doi.org/10.1109/tkde.2021.3124599.\n\n\nLi, Tian, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020.\n“Federated Learning: Challenges, Methods, and Future\nDirections.” IEEE Signal Process Mag. 37 (3): 50–60. https://doi.org/10.1109/msp.2020.2975749.\n\n\nLi, Xiang, Tao Qin, Jian Yang, and Tie-Yan Liu. 2016.\n“LightRNN: Memory and\nComputation-Efficient Recurrent Neural Networks.” In Advances\nin Neural Information Processing Systems 29: Annual Conference on Neural\nInformation Processing Systems 2016, December 5-10, 2016, Barcelona,\nSpain, edited by Daniel D. Lee, Masashi Sugiyama, Ulrike von\nLuxburg, Isabelle Guyon, and Roman Garnett, 4385–93. https://proceedings.neurips.cc/paper/2016/hash/c3e4035af2a1cde9f21e1ae1951ac80b-Abstract.html.\n\n\nLi, Yuhang, Xin Dong, and Wei Wang. 2020. “Additive Powers-of-Two\nQuantization: An Efficient Non-Uniform Discretization for\nNeural Networks.” In 8th International Conference on Learning\nRepresentations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30,\n2020. OpenReview.net. https://openreview.net/forum?id=BkgXT24tDS.\n\n\nLi, Zhizhong, and Derek Hoiem. 2018. “Learning Without\nForgetting.” IEEE Trans. Pattern Anal. Mach. Intell. 40\n(12): 2935–47. https://doi.org/10.1109/tpami.2017.2773081.\n\n\nLin, Ji, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han.\n2020. “MCUNet: Tiny Deep Learning on\nIoT Devices.” In Advances in Neural Information\nProcessing Systems 33: Annual Conference on Neural Information\nProcessing Systems 2020, NeurIPS 2020, December 6-12, 2020,\nVirtual, edited by Hugo Larochelle, Marc’Aurelio Ranzato, Raia\nHadsell, Maria-Florina Balcan, and Hsuan-Tien Lin. https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html.\n\n\nLin, Ji, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, and Song\nHan. 2022. “On-Device Training Under 256kb Memory.”\nAdv. Neur. In. 35: 22941–54.\n\n\nLin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona,\nDeva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014.\n“Microsoft Coco: Common Objects in Context.”\nIn Computer VisionECCV 2014: 13th European Conference,\nZurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13,\n740–55. Springer.\n\n\nLindgren, Simon. 2023. Handbook of Critical Studies of Artificial\nIntelligence. Edward Elgar Publishing.\n\n\nLindholm, Andreas, Dave Zachariah, Petre Stoica, and Thomas B. Schon.\n2019. “Data Consistency Approach to Model Validation.”\n#IEEE_O_ACC# 7: 59788–96. https://doi.org/10.1109/access.2019.2915109.\n\n\nLindholm, Erik, John Nickolls, Stuart Oberman, and John Montrym. 2008.\n“NVIDIA Tesla: A Unified Graphics and\nComputing Architecture.” IEEE Micro 28 (2): 39–55. https://doi.org/10.1109/mm.2008.31.\n\n\nLin, Tang Tang, Dang Yang, and Han Gan. 2023. “AWQ:\nActivation-aware Weight Quantization for\nLLM Compression and Acceleration.” ArXiv\nPreprint. https://arxiv.org/abs/2306.00978.\n\n\nLiu, Yanan, Xiaoxia Wei, Jinyu Xiao, Zhijie Liu, Yang Xu, and Yun Tian.\n2020. “Energy Consumption and Emission Mitigation Prediction Based\non Data Center Traffic and PUE for Global Data\nCenters.” Global Energy Interconnection 3 (3): 272–82.\nhttps://doi.org/10.1016/j.gloei.2020.07.008.\n\n\nLiu, Yingcheng, Guo Zhang, Christopher G. Tarolli, Rumen Hristov, Stella\nJensen-Roberts, Emma M. Waddell, Taylor L. Myers, et al. 2022.\n“Monitoring Gait at Home with Radio Waves in\nParkinson’s Disease: A Marker of Severity,\nProgression, and Medication Response.” Sci. Transl. Med.\n14 (663): eadc9669. https://doi.org/10.1126/scitranslmed.adc9669.\n\n\nLoh, Gabriel H. 2008. “3D-Stacked Memory\nArchitectures for Multi-Core Processors.” ACM SIGARCH\nComputer Architecture News 36 (3): 453–64. https://doi.org/10.1145/1394608.1382159.\n\n\nLopez-Paz, David, and Marc’Aurelio Ranzato. 2017. “Gradient\nEpisodic Memory for Continual Learning.” Adv Neural Inf\nProcess Syst 30.\n\n\nLou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013.\n“Accurate Intelligible Models with Pairwise Interactions.”\nIn Proceedings of the 19th ACM SIGKDD International Conference on\nKnowledge Discovery and Data Mining, edited by Inderjit S. Dhillon,\nYehuda Koren, Rayid Ghani, Ted E. Senator, Paul Bradley, Rajesh Parekh,\nJingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, 623–31. ACM. https://doi.org/10.1145/2487575.2487579.\n\n\nLowy, Andrew, Rakesh Pavan, Sina Baharlouei, Meisam Razaviyayn, and\nAhmad Beirami. 2021. “Fermi: Fair Empirical Risk\nMinimization via Exponential Rényi Mutual Information.”\n\n\nLuebke, David. 2008. “CUDA: Scalable\nParallel Programming for High-Performance Scientific Computing.”\nIn 2008 5th IEEE International Symposium on Biomedical Imaging: From\nNano to Macro, 836–38. IEEE. https://doi.org/10.1109/isbi.2008.4541126.\n\n\nLundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to\nInterpreting Model Predictions.” In Advances in Neural\nInformation Processing Systems 30: Annual Conference on Neural\nInformation Processing Systems 2017, December 4-9, 2017, Long Beach, CA,\nUSA, edited by Isabelle Guyon, Ulrike von Luxburg, Samy Bengio,\nHanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett,\n4765–74. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.\n\n\nMa, Dongning, Fred Lin, Alban Desmaison, Joel Coburn, Daniel Moore,\nSriram Sankar, and Xun Jiao. 2024. “Dr.\nDNA: Combating Silent Data Corruptions in Deep\nLearning Using Distribution of Neuron Activations.” In\nProceedings of the 29th ACM International Conference on\nArchitectural Support for Programming Languages and Operating Systems,\nVolume 3, 239–52. ACM. https://doi.org/10.1145/3620666.3651349.\n\n\nMaas, Martin, David G. Andersen, Michael Isard, Mohammad Mahdi\nJavanmard, Kathryn S. McKinley, and Colin Raffel. 2024. “Combining\nMachine Learning and Lifetime-Based Resource Management for Memory\nAllocation and Beyond.” Commun. ACM 67 (4): 87–96. https://doi.org/10.1145/3611018.\n\n\nMaass, Wolfgang. 1997. “Networks of Spiking Neurons:\nThe Third Generation of Neural Network Models.”\nNeural Networks 10 (9): 1659–71. https://doi.org/10.1016/s0893-6080(97)00011-7.\n\n\nMadry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras,\nand Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to\nAdversarial Attacks.” arXiv Preprint arXiv:1706.06083.\n\n\nMahmoud, Abdulrahman, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez\nVicarte, Sarita V. Adve, Christopher W. Fletcher, Iuri Frosio, and Siva\nKumar Sastry Hari. 2020. “PyTorchFI: A\nRuntime Perturbation Tool for DNNs.” In 2020\n50th Annual IEEE/IFIP International Conference on Dependable Systems and\nNetworks Workshops (DSN-w), 25–31. IEEE; IEEE. https://doi.org/10.1109/dsn-w50199.2020.00014.\n\n\nMahmoud, Abdulrahman, Siva Kumar Sastry Hari, Christopher W. Fletcher,\nSarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael\nB. Sullivan, Timothy Tsai, and Stephen W. Keckler. 2021.\n“Optimizing Selective Protection for CNN\nResilience.” In 2021 IEEE 32nd International Symposium on\nSoftware Reliability Engineering (ISSRE), 127–38. IEEE. https://doi.org/10.1109/issre52982.2021.00025.\n\n\nMahmoud, Abdulrahman, Thierry Tambe, Tarek Aloui, David Brooks, and\nGu-Yeon Wei. 2022. “GoldenEye: A\nPlatform for Evaluating Emerging Numerical Data Formats in\nDNN Accelerators.” In 2022 52nd Annual IEEE/IFIP\nInternational Conference on Dependable Systems and Networks (DSN),\n206–14. IEEE. https://doi.org/10.1109/dsn53405.2022.00031.\n\n\nMarković, Danijela, Alice Mizrahi, Damien Querlioz, and Julie Grollier.\n2020. “Physics for Neuromorphic Computing.” Nature\nReviews Physics 2 (9): 499–510. https://doi.org/10.1038/s42254-020-0208-2.\n\n\nMartin, C. Dianne. 1993. “The Myth of the Awesome Thinking\nMachine.” Commun. ACM 36 (4): 120–33. https://doi.org/10.1145/255950.153587.\n\n\nMarulli, Fiammetta, Stefano Marrone, and Laura Verde. 2022.\n“Sensitivity of Machine Learning Approaches to Fake and Untrusted\nData in Healthcare Domain.” Journal of Sensor and Actuator\nNetworks 11 (2): 21. https://doi.org/10.3390/jsan11020021.\n\n\nMaslej, Nestor, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy,\nKatrina Ligett, Terah Lyons, James Manyika, et al. 2023.\n“Artificial Intelligence Index Report 2023.” ArXiv\nPreprint abs/2310.03715. https://arxiv.org/abs/2310.03715.\n\n\nMattson, Peter, Vijay Janapa Reddi, Christine Cheng, Cody Coleman, Greg\nDiamos, David Kanter, Paulius Micikevicius, et al. 2020a.\n“MLPerf: An Industry Standard Benchmark\nSuite for Machine Learning Performance.” IEEE Micro 40\n(2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\n———, et al. 2020b. “MLPerf: An Industry\nStandard Benchmark Suite for Machine Learning Performance.”\nIEEE Micro 40 (2): 8–16. https://doi.org/10.1109/mm.2020.2974843.\n\n\nMazumder, Mark, Sharad Chitlangia, Colby Banbury, Yiping Kang, Juan\nManuel Ciro, Keith Achorn, Daniel Galvez, et al. 2021.\n“Multilingual Spoken Words Corpus.” In Thirty-Fifth\nConference on Neural Information Processing Systems Datasets and\nBenchmarks Track (Round 2).\n\n\nMcCarthy, John. 1981. “Epistemological Problems of Artificial\nIntelligence.” In Readings in Artificial Intelligence,\n459–65. Elsevier. https://doi.org/10.1016/b978-0-934613-03-3.50035-0.\n\n\nMcMahan, Brendan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise\nAgüera y Arcas. 2017. “Communication-Efficient Learning of Deep\nNetworks from Decentralized Data.” In Proceedings of the 20th\nInternational Conference on Artificial Intelligence and Statistics,\nAISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, edited by\nAarti Singh and Xiaojin (Jerry) Zhu, 54:1273–82. Proceedings of Machine\nLearning Research. PMLR. http://proceedings.mlr.press/v54/mcmahan17a.html.\n\n\nMiller, Charlie. 2019. “Lessons Learned from Hacking a\nCar.” IEEE Design &Amp; Test 36 (6): 7–9. https://doi.org/10.1109/mdat.2018.2863106.\n\n\nMiller, Charlie, and Chris Valasek. 2015. “Remote Exploitation of\nan Unaltered Passenger Vehicle.” Black Hat USA 2015 (S\n91): 1–91.\n\n\nMiller, D. A. B. 2000. “Optical Interconnects to Silicon.”\n#IEEE_J_JSTQE# 6 (6): 1312–17. https://doi.org/10.1109/2944.902184.\n\n\nMills, Andrew, and Stephen Le Hunte. 1997. “An Overview of\nSemiconductor Photocatalysis.” J. Photochem. Photobiol.,\nA 108 (1): 1–35. https://doi.org/10.1016/s1010-6030(97)00118-4.\n\n\nMirhoseini, Azalia, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang,\nEbrahim Songhori, Shen Wang, Young-Joon Lee, et al. 2021. “A Graph\nPlacement Methodology for Fast Chip Design.” Nature 594\n(7862): 207–12. https://doi.org/10.1038/s41586-021-03544-w.\n\n\nMishra, Asit K., Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan\nStosic, Ganesh Venkatesh, Chong Yu, and Paulius Micikevicius. 2021.\n“Accelerating Sparse Deep Neural Networks.” CoRR\nabs/2104.08378. https://arxiv.org/abs/2104.08378.\n\n\nMittal, Sparsh, Gaurav Verma, Brajesh Kaushik, and Farooq A. Khanday.\n2021. “A Survey of SRAM-Based in-Memory Computing\nTechniques and Applications.” J. Syst. Architect. 119\n(October): 102276. https://doi.org/10.1016/j.sysarc.2021.102276.\n\n\nModha, Dharmendra S., Filipp Akopyan, Alexander Andreopoulos,\nRathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta,\net al. 2023. “Neural Inference at the Frontier of Energy, Space,\nand Time.” Science 382 (6668): 329–35. https://doi.org/10.1126/science.adh1174.\n\n\nMohanram, K., and N. A. Touba. 2003. “Partial Error Masking to\nReduce Soft Error Failure Rate in Logic Circuits.” In\nProceedings. 16th IEEE Symposium on Computer Arithmetic,\n433–40. IEEE; IEEE Comput. Soc. https://doi.org/10.1109/dftvs.2003.1250141.\n\n\nMonyei, Chukwuka G., and Kirsten E. H. Jenkins. 2018. “Electrons\nHave No Identity: Setting Right Misrepresentations in\nGoogle and Apple’s Clean Energy Purchasing.”\nEnergy Research &Amp; Social Science 46 (December): 48–51.\nhttps://doi.org/10.1016/j.erss.2018.06.015.\n\n\nMoshawrab, Mohammad, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim,\nand Ali Raad. 2023. “Reviewing Federated Learning Aggregation\nAlgorithms; Strategies, Contributions, Limitations and Future\nPerspectives.” Electronics 12 (10): 2287. https://doi.org/10.3390/electronics12102287.\n\n\nMukherjee, S. S., J. Emer, and S. K. Reinhardt. 2005. “The Soft\nError Problem: An Architectural Perspective.” In\n11th International Symposium on High-Performance Computer\nArchitecture, 243–47. IEEE; IEEE. https://doi.org/10.1109/hpca.2005.37.\n\n\nMunshi, Aaftab. 2009. “The OpenCL\nSpecification.” In 2009 IEEE Hot Chips 21 Symposium\n(HCS), 1–314. IEEE. https://doi.org/10.1109/hotchips.2009.7478342.\n\n\nMusk, Elon et al. 2019. “An Integrated Brain-Machine Interface\nPlatform with Thousands of Channels.” J. Med. Internet\nRes. 21 (10): e16194. https://doi.org/10.2196/16194.\n\n\nMyllyaho, Lalli, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen,\nand Tommi Mikkonen. 2022. “On Misbehaviour and Fault Tolerance in\nMachine Learning Systems.” J. Syst. Software 183\n(January): 111096. https://doi.org/10.1016/j.jss.2021.111096.\n\n\nNakano, Jane. 2021. The Geopolitics of Critical Minerals Supply\nChains. JSTOR.\n\n\nNarayanan, Arvind, and Vitaly Shmatikov. 2006. “How to Break\nAnonymity of the Netflix Prize Dataset.” arXiv Preprint\nCs/0610105.\n\n\nNg, Davy Tsz Kit, Jac Ka Lok Leung, Kai Wah Samuel Chu, and Maggie Shen\nQiao. 2021. “AI Literacy: Definition,\nTeaching, Evaluation and Ethical Issues.” Proceedings of the\nAssociation for Information Science and Technology 58 (1): 504–9.\n\n\nNgo, Richard, Lawrence Chan, and Sören Mindermann. 2022. “The\nAlignment Problem from a Deep Learning Perspective.” ArXiv\nPreprint abs/2209.00626. https://arxiv.org/abs/2209.00626.\n\n\nNguyen, Ngoc-Bao, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and\nNgai-Man Cheung. 2023. “Re-Thinking Model Inversion Attacks\nAgainst Deep Neural Networks.” In 2023 IEEE/CVF Conference on\nComputer Vision and Pattern Recognition (CVPR), 16384–93. IEEE. https://doi.org/10.1109/cvpr52729.2023.01572.\n\n\nNorrie, Thomas, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li,\nJames Laudon, Cliff Young, Norman Jouppi, and David Patterson. 2021.\n“The Design Process for Google’s Training Chips:\nTpuv2 and TPUv3.” IEEE Micro\n41 (2): 56–63. https://doi.org/10.1109/mm.2021.3058217.\n\n\nNorthcutt, Curtis G, Anish Athalye, and Jonas Mueller. 2021.\n“Pervasive Label Errors in Test Sets Destabilize Machine Learning\nBenchmarks.” arXiv. https://doi.org/https://doi.org/10.48550/arXiv.2103.14749\narXiv-issued DOI via DataCite.\n\n\nObermeyer, Ziad, Brian Powers, Christine Vogeli, and Sendhil\nMullainathan. 2019. “Dissecting Racial Bias in an Algorithm Used\nto Manage the Health of Populations.” Science 366\n(6464): 447–53. https://doi.org/10.1126/science.aax2342.\n\n\nOecd. 2023. “A Blueprint for Building National Compute Capacity\nfor Artificial Intelligence.” 350. Organisation for Economic\nCo-Operation; Development (OECD). https://doi.org/10.1787/876367e3-en.\n\n\nOlah, Chris, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael\nPetrov, and Shan Carter. 2020. “Zoom in: An\nIntroduction to Circuits.” Distill 5 (3): e00024–001. https://doi.org/10.23915/distill.00024.001.\n\n\nOliynyk, Daryna, Rudolf Mayer, and Andreas Rauber. 2023. “I Know\nWhat You Trained Last Summer: A Survey on Stealing Machine\nLearning Models and Defences.” ACM Comput. Surv. 55\n(14s): 1–41. https://doi.org/10.1145/3595292.\n\n\nOoko, Samson Otieno, Marvin Muyonga Ogore, Jimmy Nsenga, and Marco\nZennaro. 2021. “TinyML in Africa:\nOpportunities and Challenges.” In 2021 IEEE\nGlobecom Workshops (GC Wkshps), 1–6. IEEE; IEEE. https://doi.org/10.1109/gcwkshps52748.2021.9682107.\n\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022.\n“Poisoning Attacks Against Machine Learning: Can\nMachine Learning Be Trustworthy?” Computer 55 (11):\n94–99. https://doi.org/10.1109/mc.2022.3190787.\n\n\nPan, Sinno Jialin, and Qiang Yang. 2010. “A Survey on Transfer\nLearning.” IEEE Trans. Knowl. Data Eng. 22 (10):\n1345–59. https://doi.org/10.1109/tkde.2009.191.\n\n\nPanda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy. 2019.\n“Discretization Based Solutions for Secure Machine Learning\nAgainst Adversarial Attacks.” #IEEE_O_ACC# 7: 70157–68.\nhttps://doi.org/10.1109/access.2019.2919463.\n\n\nPapadimitriou, George, and Dimitris Gizopoulos. 2021.\n“Demystifying the System Vulnerability Stack:\nTransient Fault Effects Across the Layers.” In\n2021 ACM/IEEE 48th Annual International Symposium on Computer\nArchitecture (ISCA), 902–15. IEEE; IEEE. https://doi.org/10.1109/isca52012.2021.00075.\n\n\nPapernot, Nicolas, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram\nSwami. 2016. “Distillation as a Defense to Adversarial\nPerturbations Against Deep Neural Networks.” In 2016 IEEE\nSymposium on Security and Privacy (SP), 582–97. IEEE; IEEE. https://doi.org/10.1109/sp.2016.41.\n\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max\nBartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler:\nA Data-Centric Challenge for Improving the Safety of\nText-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\n\nPatterson, David A, and John L Hennessy. 2016. Computer Organization\nand Design ARM Edition: The Hardware Software\nInterface. Morgan kaufmann.\n\n\nPatterson, David, Joseph Gonzalez, Urs Holzle, Quoc Le, Chen Liang,\nLluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and\nJeff Dean. 2022. “The Carbon Footprint of Machine Learning\nTraining Will Plateau, Then Shrink.” Computer 55 (7):\n18–28. https://doi.org/10.1109/mc.2022.3148714.\n\n\nPeters, Dorian, Rafael A. Calvo, and Richard M. Ryan. 2018.\n“Designing for Motivation, Engagement and Wellbeing in Digital\nExperience.” Front. Psychol. 9 (May): 797. https://doi.org/10.3389/fpsyg.2018.00797.\n\n\nPhillips, P Jonathon, Carina A Hahn, Peter C Fontana, David A\nBroniatowski, and Mark A Przybocki. 2020. “Four Principles of\nExplainable Artificial Intelligence.” Gaithersburg,\nMaryland 18.\n\n\nPlank, James S. 1997. “A Tutorial on\nReedSolomon Coding for Fault-Tolerance in\nRAID-Like Systems.” Software: Practice and\nExperience 27 (9): 995–1012.\n\n\nPont, Michael J, and Royan HL Ong. 2002. “Using Watchdog Timers to\nImprove the Reliability of Single-Processor Embedded Systems:\nSeven New Patterns and a Case Study.” In\nProceedings of the First Nordic Conference on Pattern Languages of\nPrograms, 159–200. Citeseer.\n\n\nPrakash, Shvetank, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan\nV. Green, Pete Warden, Tim Ansell, and Vijay Janapa Reddi. 2023.\n“CFU Playground: Full-stack Open-Source Framework for Tiny Machine\nLearning (TinyML) Acceleration on\nFPGAs.” In 2023 IEEE International Symposium on\nPerformance Analysis of Systems and Software (ISPASS). Vol.\nabs/2201.01863. IEEE. https://doi.org/10.1109/ispass57527.2023.00024.\n\n\nPrakash, Shvetank, Matthew Stewart, Colby Banbury, Mark Mazumder, Pete\nWarden, Brian Plancher, and Vijay Janapa Reddi. 2023. “Is\nTinyML Sustainable? Assessing the Environmental Impacts of\nMachine Learning on Microcontrollers.” ArXiv Preprint.\nhttps://arxiv.org/abs/2301.11899.\n\n\nPsoma, Sotiria D., and Chryso Kanthou. 2023. “Wearable Insulin\nBiosensors for Diabetes Management: Advances and\nChallenges.” Biosensors 13 (7): 719. https://doi.org/10.3390/bios13070719.\n\n\nPushkarna, Mahima, Andrew Zaldivar, and Oddur Kjartansson. 2022.\n“Data Cards: Purposeful and Transparent Dataset\nDocumentation for Responsible AI.” In 2022 ACM\nConference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3531146.3533231.\n\n\nPutnam, Andrew, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros\nConstantinides, John Demme, Hadi Esmaeilzadeh, et al. 2014. “A\nReconfigurable Fabric for Accelerating Large-Scale Datacenter\nServices.” ACM SIGARCH Computer Architecture News 42\n(3): 13–24. https://doi.org/10.1145/2678373.2665678.\n\n\nQi, Chen, Shibo Shen, Rongpeng Li, Zhifeng Zhao, Qing Liu, Jing Liang,\nand Honggang Zhang. 2021. “An Efficient Pruning Scheme of Deep\nNeural Networks for Internet of Things Applications.” EURASIP\nJournal on Advances in Signal Processing 2021 (1). https://doi.org/10.1186/s13634-021-00744-4.\n\n\nQian, Yu, Xuegong Zhou, Hao Zhou, and Lingli Wang. 2024. “An\nEfficient Reinforcement Learning Based Framework for Exploring Logic\nSynthesis.” ACM Trans. Des. Autom. Electron. Syst. 29\n(2): 1–33. https://doi.org/10.1145/3632174.\n\n\nR. V., Rashmi, and Karthikeyan A. 2018. “Secure Boot of Embedded\nApplications - a Review.” In 2018 Second International\nConference on Electronics, Communication and Aerospace Technology\n(ICECA), 291–98. IEEE. https://doi.org/10.1109/iceca.2018.8474730.\n\n\nRaina, Rajat, Anand Madhavan, and Andrew Y. Ng. 2009. “Large-Scale\nDeep Unsupervised Learning Using Graphics Processors.” In\nProceedings of the 26th Annual International Conference on Machine\nLearning, edited by Andrea Pohoreckyj Danyluk, Léon Bottou, and\nMichael L. Littman, 382:873–80. ACM International Conference Proceeding\nSeries. ACM. https://doi.org/10.1145/1553374.1553486.\n\n\nRamaswamy, Vikram V., Sunnie S. Y. Kim, Ruth Fong, and Olga Russakovsky.\n2023a. “Overlooked Factors in Concept-Based Explanations:\nDataset Choice, Concept Learnability, and Human\nCapability.” In 2023 IEEE/CVF Conference on Computer Vision\nand Pattern Recognition (CVPR), 10932–41. IEEE. https://doi.org/10.1109/cvpr52729.2023.01052.\n\n\nRamaswamy, Vikram V, Sunnie SY Kim, Ruth Fong, and Olga Russakovsky.\n2023b. “UFO: A Unified Method for\nControlling Understandability and Faithfulness Objectives in\nConcept-Based Explanations for CNNs.” ArXiv\nPreprint abs/2303.15632. https://arxiv.org/abs/2303.15632.\n\n\nRamcharan, Amanda, Kelsee Baranowski, Peter McCloskey, Babuali Ahmed,\nJames Legg, and David P. Hughes. 2017. “Deep Learning for\nImage-Based Cassava Disease Detection.” Front. Plant\nSci. 8 (October): 1852. https://doi.org/10.3389/fpls.2017.01852.\n\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss,\nAlec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot\nText-to-Image Generation.” In Proceedings of the 38th\nInternational Conference on Machine Learning, ICML 2021, 18-24 July\n2021, Virtual Event, edited by Marina Meila and Tong Zhang,\n139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\n\nRanganathan, Parthasarathy. 2011. “From Microprocessors to\nNanostores: Rethinking Data-Centric Systems.”\nComputer 44 (1): 39–48. https://doi.org/10.1109/mc.2011.18.\n\n\nRao, Ravi. 2021. Www.wevolver.com. https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies.\n\n\nRashid, Layali, Karthik Pattabiraman, and Sathish Gopalakrishnan. 2012.\n“Intermittent Hardware Errors Recovery: Modeling and\nEvaluation.” In 2012 Ninth International Conference on\nQuantitative Evaluation of Systems, 220–29. IEEE; IEEE. https://doi.org/10.1109/qest.2012.37.\n\n\n———. 2015. “Characterizing the Impact of Intermittent Hardware\nFaults on Programs.” IEEE Trans. Reliab. 64 (1):\n297–310. https://doi.org/10.1109/tr.2014.2363152.\n\n\nRatner, Alex, Braden Hancock, Jared Dunnmon, Roger Goldman, and\nChristopher Ré. 2018. “Snorkel MeTaL: Weak\nSupervision for Multi-Task Learning.” In Proceedings of the\nSecond Workshop on Data Management for End-to-End Machine Learning.\nACM. https://doi.org/10.1145/3209889.3209898.\n\n\nReagen, Brandon, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu\nLee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. “Ares:\nA Framework for Quantifying the Resilience of Deep Neural\nNetworks.” In 2018 55th ACM/ESDA/IEEE Design Automation\nConference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465834.\n\n\nReagen, Brandon, Jose Miguel Hernandez-Lobato, Robert Adolf, Michael\nGelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. “A\nCase for Efficient Accelerator Design Space Exploration via\nBayesian Optimization.” In 2017 IEEE/ACM\nInternational Symposium on Low Power Electronics and Design\n(ISLPED), 1–6. IEEE; IEEE. https://doi.org/10.1109/islped.2017.8009208.\n\n\nReddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson,\nGuenther Schmuelling, Carole-Jean Wu, Brian Anderson, et al. 2020.\n“MLPerf Inference Benchmark.” In 2020\nACM/IEEE 47th Annual International Symposium on Computer Architecture\n(ISCA), 446–59. IEEE; IEEE. https://doi.org/10.1109/isca45697.2020.00045.\n\n\nReddi, Vijay Janapa, and Meeta Sharma Gupta. 2013. Resilient\nArchitecture Design for Voltage Variation. Springer International\nPublishing. https://doi.org/10.1007/978-3-031-01739-1.\n\n\nReis, G. A., J. Chang, N. Vachharajani, R. Rangan, and D. I. August.\n2005. “SWIFT: Software Implemented Fault\nTolerance.” In International Symposium on Code Generation and\nOptimization, 243–54. IEEE; IEEE. https://doi.org/10.1109/cgo.2005.34.\n\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016.\n“” Why Should i Trust You?” Explaining\nthe Predictions of Any Classifier.” In Proceedings of the\n22nd ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, 1135–44.\n\n\nRobbins, Herbert, and Sutton Monro. 1951. “A Stochastic\nApproximation Method.” The Annals of Mathematical\nStatistics 22 (3): 400–407. https://doi.org/10.1214/aoms/1177729586.\n\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and\nBjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent\nDiffusion Models.” In 2022 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\n\n\nRosa, Gustavo H. de, and João P. Papa. 2021. “A Survey on Text\nGeneration Using Generative Adversarial Networks.” Pattern\nRecogn. 119 (November): 108098. https://doi.org/10.1016/j.patcog.2021.108098.\n\n\nRosenblatt, Frank. 1957. The Perceptron, a Perceiving and\nRecognizing Automaton Project Para. Cornell Aeronautical\nLaboratory.\n\n\nRoskies, Adina. 2002. “Neuroethics for the New Millenium.”\nNeuron 35 (1): 21–23. https://doi.org/10.1016/s0896-6273(02)00763-8.\n\n\nRuder, Sebastian. 2016. “An Overview of Gradient Descent\nOptimization Algorithms.” ArXiv Preprint abs/1609.04747.\nhttps://arxiv.org/abs/1609.04747.\n\n\nRudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning\nModels for High Stakes Decisions and Use Interpretable Models\nInstead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.\n\n\nRumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986.\n“Learning Representations by Back-Propagating Errors.”\nNature 323 (6088): 533–36. https://doi.org/10.1038/323533a0.\n\n\nRussakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh,\nSean Ma, Zhiheng Huang, et al. 2015. “ImageNet Large\nScale Visual Recognition Challenge.” Int. J. Comput.\nVision 115 (3): 211–52. https://doi.org/10.1007/s11263-015-0816-y.\n\n\nRussell, Stuart. 2021. “Human-Compatible Artificial\nIntelligence.” Human-Like Machine Intelligence, 3–23.\n\n\nRyan, Richard M., and Edward L. Deci. 2000. “Self-Determination\nTheory and the Facilitation of Intrinsic Motivation, Social Development,\nand Well-Being.” Am. Psychol. 55 (1): 68–78. https://doi.org/10.1037/0003-066x.55.1.68.\n\n\nSamajdar, Ananda, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar\nKrishna. 2018. “Scale-Sim: Systolic Cnn Accelerator\nSimulator.” ArXiv Preprint abs/1811.02883. https://arxiv.org/abs/1811.02883.\n\n\nSambasivan, Nithya, Shivani Kapania, Hannah Highfill, Diana Akrong,\nPraveen Paritosh, and Lora M Aroyo. 2021a.\n““Everyone Wants to Do the Model Work,\nNot the Data Work”: Data Cascades in\nHigh-Stakes AI.” In Proceedings of the 2021 CHI\nConference on Human Factors in Computing Systems, 1–15. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\n———. 2021b. ““Everyone Wants to Do the\nModel Work, Not the Data Work”: Data Cascades\nin High-Stakes AI.” In Proceedings of the 2021\nCHI Conference on Human Factors in Computing Systems. CHI ’21. New\nYork, NY, USA: ACM. https://doi.org/10.1145/3411764.3445518.\n\n\n———. 2021c. ““Everyone Wants to Do the\nModel Work, Not the Data Work”: Data Cascades\nin High-Stakes AI.” In Proceedings of the 2021\nCHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3411764.3445518.\n\n\nSangchoolie, Behrooz, Karthik Pattabiraman, and Johan Karlsson. 2017.\n“One Bit Is (Not) Enough: An Empirical\nStudy of the Impact of Single and Multiple Bit-Flip Errors.” In\n2017 47th Annual IEEE/IFIP International Conference on Dependable\nSystems and Networks (DSN), 97–108. IEEE; IEEE. https://doi.org/10.1109/dsn.2017.30.\n\n\nSchäfer, Mike S. 2023. “The Notorious GPT:\nScience Communication in the Age of Artificial\nIntelligence.” Journal of Science Communication 22 (02):\nY02. https://doi.org/10.22323/2.22020402.\n\n\nSchizas, Nikolaos, Aristeidis Karras, Christos Karras, and Spyros\nSioutas. 2022. “TinyML for Ultra-Low Power\nAI and Large Scale IoT Deployments:\nA Systematic Review.” Future Internet 14\n(12): 363. https://doi.org/10.3390/fi14120363.\n\n\nSchuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker\nMitchell, Prasanna Date, and Bill Kay. 2022. “Opportunities for\nNeuromorphic Computing Algorithms and Applications.” Nature\nComputational Science 2 (1): 10–19. https://doi.org/10.1038/s43588-021-00184-y.\n\n\nSchwartz, Daniel, Jonathan Michael Gomes Selman, Peter Wrege, and\nAndreas Paepcke. 2021. “Deployment of Embedded\nEdge-AI for Wildlife Monitoring in Remote Regions.”\nIn 2021 20th IEEE International Conference on Machine Learning and\nApplications (ICMLA), 1035–42. IEEE; IEEE. https://doi.org/10.1109/icmla52953.2021.00170.\n\n\nSchwartz, Roy, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020.\n“Green AI.” Commun. ACM 63 (12):\n54–63. https://doi.org/10.1145/3381831.\n\n\nSegal, Mark, and Kurt Akeley. 1999. “The OpenGL\nGraphics System: A Specification (Version 1.1).”\n\n\nSegura Anaya, L. H., Abeer Alsadoon, N. Costadopoulos, and P. W. C.\nPrasad. 2017. “Ethical Implications of User Perceptions of\nWearable Devices.” Sci. Eng. Ethics 24 (1): 1–28. https://doi.org/10.1007/s11948-017-9872-8.\n\n\nSeide, Frank, and Amit Agarwal. 2016. “Cntk: Microsoft’s\nOpen-Source Deep-Learning Toolkit.” In Proceedings of the\n22nd ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, 2135–35. ACM. https://doi.org/10.1145/2939672.2945397.\n\n\nSelvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna\nVedantam, Devi Parikh, and Dhruv Batra. 2017.\n“Grad-CAM: Visual Explanations from Deep\nNetworks via Gradient-Based Localization.” In 2017 IEEE\nInternational Conference on Computer Vision (ICCV), 618–26. IEEE.\nhttps://doi.org/10.1109/iccv.2017.74.\n\n\nSeong, Nak Hee, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers,\nand Hsien-Hsin S. Lee. 2010. “SAFER: Stuck-at-fault Error Recovery for\nMemories.” In 2010 43rd Annual IEEE/ACM International\nSymposium on Microarchitecture, 115–24. IEEE; IEEE. https://doi.org/10.1109/micro.2010.46.\n\n\nSeyedzadeh, Saleh, Farzad Pour Rahimian, Ivan Glesk, and Marc Roper.\n2018. “Machine Learning for Estimation of Building Energy\nConsumption and Performance: A Review.”\nVisualization in Engineering 6 (1): 1–20. https://doi.org/10.1186/s40327-018-0064-7.\n\n\nShalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. 2017. “On\na Formal Model of Safe and Scalable Self-Driving Cars.” ArXiv\nPreprint abs/1708.06374. https://arxiv.org/abs/1708.06374.\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y\nZhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image\nGenerative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\nShastri, Bhavin J., Alexander N. Tait, T. Ferreira de Lima, Wolfram H.\nP. Pernice, Harish Bhaskaran, C. D. Wright, and Paul R. Prucnal. 2021.\n“Photonics for Artificial Intelligence and Neuromorphic\nComputing.” Nat. Photonics 15 (2): 102–14. https://doi.org/10.1038/s41566-020-00754-y.\n\n\nSheaffer, Jeremy W, David P Luebke, and Kevin Skadron. 2007. “A\nHardware Redundancy and Recovery Mechanism for Reliable Scientific\nComputation on Graphics Processors.” In Graphics\nHardware, 2007:55–64. Citeseer.\n\n\nShehabi, Arman, Sarah Smith, Dale Sartor, Richard Brown, Magnus Herrlin,\nJonathan Koomey, Eric Masanet, Nathaniel Horner, Inês Azevedo, and\nWilliam Lintner. 2016. “United States Data Center Energy Usage\nReport.”\n\n\nShen, Sheng, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami,\nMichael W. Mahoney, and Kurt Keutzer. 2020. “Q-BERT:\nHessian Based Ultra Low Precision Quantization of\nBERT.” Proceedings of the AAAI Conference on\nArtificial Intelligence 34 (05): 8815–21. https://doi.org/10.1609/aaai.v34i05.6409.\n\n\nSheng, Victor S., and Jing Zhang. 2019. “Machine Learning with\nCrowdsourcing: A Brief Summary of the Past Research and\nFuture Directions.” Proceedings of the AAAI Conference on\nArtificial Intelligence 33 (01): 9837–43. https://doi.org/10.1609/aaai.v33i01.33019837.\n\n\nShi, Hongrui, and Valentin Radu. 2022. “Data Selection for\nEfficient Model Update in Federated Learning.” In Proceedings\nof the 2nd European Workshop on Machine Learning and Systems,\n72–78. ACM. https://doi.org/10.1145/3517207.3526980.\n\n\nShneiderman, Ben. 2020. “Bridging the Gap Between Ethics and\nPractice: Guidelines for Reliable, Safe, and Trustworthy Human-Centered\nAI Systems.” ACM Trans. Interact. Intell. Syst. 10 (4):\n1–31. https://doi.org/10.1145/3419764.\n\n\n———. 2022. Human-Centered AI. Oxford University\nPress.\n\n\nShokri, Reza, Marco Stronati, Congzheng Song, and Vitaly Shmatikov.\n2017. “Membership Inference Attacks Against Machine Learning\nModels.” In 2017 IEEE Symposium on Security and Privacy\n(SP), 3–18. IEEE; IEEE. https://doi.org/10.1109/sp.2017.41.\n\n\nSiddik, Md Abu Bakar, Arman Shehabi, and Landon Marston. 2021.\n“The Environmental Footprint of Data Centers in the United\nStates.” Environ. Res. Lett. 16 (6): 064017. https://doi.org/10.1088/1748-9326/abfba1.\n\n\nSilvestro, Daniele, Stefano Goria, Thomas Sterner, and Alexandre\nAntonelli. 2022. “Improving Biodiversity Protection Through\nArtificial Intelligence.” Nature Sustainability 5 (5):\n415–24. https://doi.org/10.1038/s41893-022-00851-6.\n\n\nSingh, Narendra, and Oladele A. Ogunseitan. 2022. “Disentangling\nthe Worldwide Web of e-Waste and Climate Change Co-Benefits.”\nCircular Economy 1 (2): 100011. https://doi.org/10.1016/j.cec.2022.100011.\n\n\nSkorobogatov, Sergei. 2009. “Local Heating Attacks on Flash Memory\nDevices.” In 2009 IEEE International Workshop on\nHardware-Oriented Security and Trust, 1–6. IEEE; IEEE. https://doi.org/10.1109/hst.2009.5225028.\n\n\nSkorobogatov, Sergei P, and Ross J Anderson. 2003. “Optical Fault\nInduction Attacks.” In Cryptographic Hardware and Embedded\nSystems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA,\nAugust 1315, 2002 Revised Papers 4, 2–12. Springer.\n\n\nSmilkov, Daniel, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin\nWattenberg. 2017. “Smoothgrad: Removing Noise by\nAdding Noise.” ArXiv Preprint abs/1706.03825. https://arxiv.org/abs/1706.03825.\n\n\nSnoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012.\n“Practical Bayesian Optimization of Machine Learning\nAlgorithms.” In Advances in Neural Information Processing\nSystems 25: 26th Annual Conference on Neural Information Processing\nSystems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake\nTahoe, Nevada, United States, edited by Peter L. Bartlett, Fernando\nC. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q.\nWeinberger, 2960–68. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html.\n\n\nSrivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,\nand Ruslan Salakhutdinov. 2014. “Dropout: A Simple\nWay to Prevent Neural Networks from Overfitting.” J. Mach.\nLearn. Res. http://jmlr.org/papers/v15/srivastava14a.html.\n\n\nStm32L4Q5Ag. 2021. STMicroelectronics.\n\n\nStrubell, Emma, Ananya Ganesh, and Andrew McCallum. 2019. “Energy\nand Policy Considerations for Deep Learning in NLP.”\nIn Proceedings of the 57th Annual Meeting of the Association for\nComputational Linguistics, 3645–50. Florence, Italy: Association\nfor Computational Linguistics. https://doi.org/10.18653/v1/p19-1355.\n\n\nSuda, Naveen, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma,\nSarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016.\n“Throughput-Optimized OpenCL-Based FPGA\nAccelerator for Large-Scale Convolutional Neural Networks.” In\nProceedings of the 2016 ACM/SIGDA International Symposium on\nField-Programmable Gate Arrays, 16–25. ACM. https://doi.org/10.1145/2847263.2847276.\n\n\nSudhakar, Soumya, Vivienne Sze, and Sertac Karaman. 2023. “Data\nCenters on Wheels: Emissions from Computing Onboard\nAutonomous Vehicles.” IEEE Micro 43 (1): 29–39. https://doi.org/10.1109/mm.2022.3219803.\n\n\nSze, Vivienne, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017.\n“Efficient Processing of Deep Neural Networks: A\nTutorial and Survey.” Proc. IEEE 105 (12): 2295–2329. https://doi.org/10.1109/jproc.2017.2761740.\n\n\nSzegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna,\nDumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014.\n“Intriguing Properties of Neural Networks.” In 2nd\nInternational Conference on Learning Representations, ICLR 2014, Banff,\nAB, Canada, April 14-16, 2014, Conference Track Proceedings, edited\nby Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1312.6199.\n\n\nTambe, Thierry, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa\nReddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020.\n“Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings\nfor Resilient Deep Learning Inference.” In 2020 57th ACM/IEEE\nDesign Automation Conference (DAC), 1–6. IEEE; IEEE. https://doi.org/10.1109/dac18072.2020.9218516.\n\n\nTan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler,\nAndrew Howard, and Quoc V. Le. 2019. “MnasNet: Platform-aware Neural Architecture Search for\nMobile.” In 2019 IEEE/CVF Conference on Computer Vision and\nPattern Recognition (CVPR), 2820–28. IEEE. https://doi.org/10.1109/cvpr.2019.00293.\n\n\nTan, Mingxing, and Quoc V. Le. 2023. “Demystifying Deep\nLearning.” Wiley. https://doi.org/10.1002/9781394205639.ch6.\n\n\nTang, Xin, Yichun He, and Jia Liu. 2022. “Soft Bioelectronics for\nCardiac Interfaces.” Biophysics Reviews 3 (1). https://doi.org/10.1063/5.0069516.\n\n\nTang, Xin, Hao Shen, Siyuan Zhao, Na Li, and Jia Liu. 2023.\n“Flexible Braincomputer Interfaces.”\nNature Electronics 6 (2): 109–18. https://doi.org/10.1038/s41928-022-00913-9.\n\n\nTarun, Ayush K, Vikram S Chundawat, Murari Mandal, and Mohan\nKankanhalli. 2022. “Deep Regression Unlearning.” ArXiv\nPreprint abs/2210.08196. https://arxiv.org/abs/2210.08196.\n\n\nTeam, The Theano Development, Rami Al-Rfou, Guillaume Alain, Amjad\nAlmahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, et\nal. 2016. “Theano: A Python Framework for Fast\nComputation of Mathematical Expressions.” https://arxiv.org/abs/1605.02688.\n\n\n“The Ultimate Guide to Deep Learning Model Quantization and\nQuantization-Aware Training.” n.d. https://deci.ai/quantization-and-quantization-aware-training/.\n\n\nThompson, Neil C., Kristjan Greenewald, Keeheon Lee, and Gabriel F.\nManso. 2021. “Deep Learning’s Diminishing Returns:\nThe Cost of Improvement Is Becoming Unsustainable.”\nIEEE Spectr. 58 (10): 50–55. https://doi.org/10.1109/mspec.2021.9563954.\n\n\nTill, Aaron, Andrew L. Rypel, Andrew Bray, and Samuel B. Fey. 2019.\n“Fish Die-Offs Are Concurrent with Thermal Extremes in North\nTemperate Lakes.” Nat. Clim. Change 9 (8): 637–41. https://doi.org/10.1038/s41558-019-0520-y.\n\n\nTirtalistyani, Rose, Murtiningrum Murtiningrum, and Rameshwar S. Kanwar.\n2022. “Indonesia Rice Irrigation System:\nTime for Innovation.” Sustainability 14\n(19): 12477. https://doi.org/10.3390/su141912477.\n\n\nTokui, Seiya, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa,\nShunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki\nYamazaki Vincent. 2019. “Chainer: A Deep Learning Framework for\nAccelerating the Research Cycle.” In Proceedings of the 25th\nACM SIGKDD International Conference on Knowledge Discovery &Amp;\nData Mining, 5:1–6. ACM. https://doi.org/10.1145/3292500.3330756.\n\n\nTramèr, Florian, Pascal Dupré, Gili Rusak, Giancarlo Pellegrino, and Dan\nBoneh. 2019. “AdVersarial: Perceptual Ad Blocking\nMeets Adversarial Machine Learning.” In Proceedings of the\n2019 ACM SIGSAC Conference on Computer and Communications Security,\n2005–21. ACM. https://doi.org/10.1145/3319535.3354222.\n\n\nTsai, Min-Jen, Ping-Yi Lin, and Ming-En Lee. 2023. “Adversarial\nAttacks on Medical Image Classification.” Cancers 15\n(17): 4228. https://doi.org/10.3390/cancers15174228.\n\n\nTsai, Timothy, Siva Kumar Sastry Hari, Michael Sullivan, Oreste Villa,\nand Stephen W. Keckler. 2021. “NVBitFI:\nDynamic Fault Injection for GPUs.” In\n2021 51st Annual IEEE/IFIP International Conference on Dependable\nSystems and Networks (DSN), 284–91. IEEE; IEEE. https://doi.org/10.1109/dsn48987.2021.00041.\n\n\nUddin, Mueen, and Azizah Abdul Rahman. 2012. “Energy Efficiency\nand Low Carbon Enabler Green IT Framework for Data Centers\nConsidering Green Metrics.” Renewable Sustainable Energy\nRev. 16 (6): 4078–94. https://doi.org/10.1016/j.rser.2012.03.014.\n\n\nUn, and World Economic Forum. 2019. A New Circular Vision for\nElectronics, Time for a Global Reboot. PACE - Platform for\nAccelerating the Circular Economy. https://www3.weforum.org/docs/WEF\\_A\\_New\\_Circular\\_Vision\\_for\\_Electronics.pdf.\n\n\nValenzuela, Christine L, and Pearl Y Wang. 2000. “A Genetic\nAlgorithm for VLSI Floorplanning.” In Parallel\nProblem Solving from Nature PPSN VI: 6th International Conference Paris,\nFrance, September 1820, 2000 Proceedings 6, 671–80.\nSpringer.\n\n\nVan Noorden, Richard. 2016. “ArXiv Preprint Server\nPlans Multimillion-Dollar Overhaul.” Nature 534 (7609):\n602–2. https://doi.org/10.1038/534602a.\n\n\nVangal, Sriram, Somnath Paul, Steven Hsu, Amit Agarwal, Saurabh Kumar,\nRam Krishnamurthy, Harish Krishnamurthy, James Tschanz, Vivek De, and\nChris H. Kim. 2021. “Wide-Range Many-Core SoC Design\nin Scaled CMOS: Challenges and\nOpportunities.” IEEE Trans. Very Large Scale Integr. VLSI\nSyst. 29 (5): 843–56. https://doi.org/10.1109/tvlsi.2021.3061649.\n\n\nVaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion\nJones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.\n“Attention Is All You Need.” Advances in Neural\nInformation Processing Systems 30.\n\n\n“Vector-Borne Diseases.” n.d.\nhttps://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases.\n\n\nVelazco, Raoul, Gilles Foucard, and Paul Peronnard. 2010.\n“Combining Results of Accelerated Radiation Tests and Fault\nInjections to Predict the Error Rate of an Application Implemented in\nSRAM-Based FPGAs.” IEEE Trans.\nNucl. Sci. 57 (6): 3500–3505. https://doi.org/10.1109/tns.2010.2087355.\n\n\nVerma, Naveen, Hongyang Jia, Hossein Valavi, Yinqi Tang, Murat Ozatay,\nLung-Yen Chen, Bonan Zhang, and Peter Deaville. 2019. “In-Memory\nComputing: Advances and Prospects.” IEEE\nSolid-State Circuits Mag. 11 (3): 43–55. https://doi.org/10.1109/mssc.2019.2922889.\n\n\nVerma, Team Dual_Boot: Swapnil. 2022. “Elephant\nAI.” Hackster.io. https://www.hackster.io/dual\\_boot/elephant-ai-ba71e9.\n\n\nVinuesa, Ricardo, Hossein Azizpour, Iolanda Leite, Madeline Balaam,\nVirginia Dignum, Sami Domisch, Anna Felländer, Simone Daniela Langhans,\nMax Tegmark, and Francesco Fuso Nerini. 2020. “The Role of\nArtificial Intelligence in Achieving the Sustainable Development\nGoals.” Nat. Commun. 11 (1): 1–10. https://doi.org/10.1038/s41467-019-14108-y.\n\n\nVivet, Pascal, Eric Guthmuller, Yvain Thonnart, Gael Pillonnet, Cesar\nFuguet, Ivan Miro-Panades, Guillaume Moritz, et al. 2021.\n“IntAct: A 96-Core Processor with Six\nChiplets 3D-Stacked on an Active Interposer with\nDistributed Interconnects and Integrated Power Management.”\nIEEE J. Solid-State Circuits 56 (1): 79–97. https://doi.org/10.1109/jssc.2020.3036341.\n\n\nWachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017.\n“Counterfactual Explanations Without Opening the Black Box:\nAutomated Decisions and the GDPR.”\nSSRN Electronic Journal 31: 841. https://doi.org/10.2139/ssrn.3063289.\n\n\nWald, Peter H., and Jeffrey R. Jones. 1987. “Semiconductor\nManufacturing: An Introduction to Processes and\nHazards.” Am. J. Ind. Med. 11 (2): 203–21. https://doi.org/10.1002/ajim.4700110209.\n\n\nWan, Zishen, Aqeel Anwar, Yu-Shun Hsiao, Tianyu Jia, Vijay Janapa Reddi,\nand Arijit Raychowdhury. 2021. “Analyzing and Improving Fault\nTolerance of Learning-Based Navigation Systems.” In 2021 58th\nACM/IEEE Design Automation Conference (DAC), 841–46. IEEE; IEEE. https://doi.org/10.1109/dac18074.2021.9586116.\n\n\nWan, Zishen, Yiming Gan, Bo Yu, S Liu, A Raychowdhury, and Y Zhu. 2023.\n“Vpp: The Vulnerability-Proportional Protection\nParadigm Towards Reliable Autonomous Machines.” In\nProceedings of the 5th International Workshop on Domain Specific\nSystem Architecture (DOSSA), 1–6.\n\n\nWang, LingFeng, and YaQing Zhan. 2019a. “A Conceptual Peer Review\nModel for arXiv and Other Preprint\nDatabases.” Learn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\n\n———. 2019b. “A Conceptual Peer Review Model for arXiv and Other Preprint Databases.”\nLearn. Publ. 32 (3): 213–19. https://doi.org/10.1002/leap.1229.\n\n\nWang, Tianzhe, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang,\nYujun Lin, and Song Han. 2020. “APQ:\nJoint Search for Network Architecture, Pruning and\nQuantization Policy.” In 2020 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR), 2075–84. IEEE. https://doi.org/10.1109/cvpr42600.2020.00215.\n\n\nWarden, Pete. 2018. “Speech Commands: A Dataset for\nLimited-Vocabulary Speech Recognition.” arXiv Preprint\narXiv:1804.03209.\n\n\nWarden, Pete, and Daniel Situnayake. 2019. Tinyml:\nMachine Learning with Tensorflow Lite on Arduino and\nUltra-Low-Power Microcontrollers. O’Reilly Media.\n\n\nWeik, Martin H. 1955. A Survey of Domestic Electronic Digital\nComputing Systems. Ballistic Research Laboratories.\n\n\nWeiser, Mark. 1991. “The Computer for the 21st Century.”\nSci. Am. 265 (3): 94–104. https://doi.org/10.1038/scientificamerican0991-94.\n\n\nWess, Matthias, Matvey Ivanov, Christoph Unger, and Anvesh Nookala.\n2020. “ANNETTE: Accurate Neural Network Execution Time Estimation\nwith Stacked Models.” IEEE. https://doi.org/10.1109/ACCESS.2020.3047259.\n\n\nWiener, Norbert. 1960. “Some Moral and Technical Consequences of\nAutomation: As Machines Learn They May Develop Unforeseen Strategies at\nRates That Baffle Their Programmers.” Science 131\n(3410): 1355–58. https://doi.org/10.1126/science.131.3410.1355.\n\n\nWilkening, Mark, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva\nGurumurthi, and David R. Kaeli. 2014. “Calculating Architectural\nVulnerability Factors for Spatial Multi-Bit Transient Faults.” In\n2014 47th Annual IEEE/ACM International Symposium on\nMicroarchitecture, 293–305. IEEE; IEEE. https://doi.org/10.1109/micro.2014.15.\n\n\nWinkler, Harald, Franck Lecocq, Hans Lofgren, Maria Virginia Vilariño,\nSivan Kartha, and Joana Portugal-Pereira. 2022. “Examples of\nShifting Development Pathways: Lessons on How to Enable\nBroader, Deeper, and Faster Climate Action.” Climate\nAction 1 (1). https://doi.org/10.1007/s44168-022-00026-1.\n\n\nWong, H.-S. Philip, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu,\nPang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai.\n2012. “MetalOxide\nRRAM.” Proc. IEEE 100 (6): 1951–70. https://doi.org/10.1109/jproc.2012.2190369.\n\n\nWu, Bichen, Kurt Keutzer, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang,\nFei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, and Yangqing Jia. 2019.\n“FBNet: Hardware-aware\nEfficient ConvNet Design via Differentiable Neural\nArchitecture Search.” In 2019 IEEE/CVF Conference on Computer\nVision and Pattern Recognition (CVPR), 10734–42. IEEE. https://doi.org/10.1109/cvpr.2019.01099.\n\n\nWu, Carole-Jean, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha\nArdalani, Kiwan Maeng, Gloria Chang, et al. 2022. “Sustainable Ai:\nEnvironmental Implications, Challenges and\nOpportunities.” Proceedings of Machine Learning and\nSystems 4: 795–813.\n\n\nWu, Zhang Judd, and Micikevicius Isaev. 2020. “Integer\nQuantization for Deep Learning Inference: Principles and\nEmpirical Evaluation).” ArXiv Preprint. https://arxiv.org/abs/2004.09602.\n\n\nXiao, Seznec Lin, Demouth Wu, and Han. 2022.\n“SmoothQuant: Accurate and Efficient\nPost-Training Quantization for Large Language Models.” ArXiv\nPreprint. https://arxiv.org/abs/2211.10438.\n\n\nXie, Cihang, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and\nQuoc V. Le. 2020. “Adversarial Examples Improve Image\nRecognition.” In 2020 IEEE/CVF Conference on Computer Vision\nand Pattern Recognition (CVPR), 816–25. IEEE. https://doi.org/10.1109/cvpr42600.2020.00090.\n\n\nXie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He.\n2017. “Aggregated Residual Transformations for Deep Neural\nNetworks.” In 2017 IEEE Conference on Computer Vision and\nPattern Recognition (CVPR), 1492–1500. IEEE. https://doi.org/10.1109/cvpr.2017.634.\n\n\nXinyu, Chen. n.d.\n\n\nXiong, Siyu, Guoqing Wu, Xitian Fan, Xuan Feng, Zhongcheng Huang, Wei\nCao, Xuegong Zhou, et al. 2021. “MRI-Based Brain\nTumor Segmentation Using FPGA-Accelerated Neural\nNetwork.” BMC Bioinf. 22 (1): 421. https://doi.org/10.1186/s12859-021-04347-6.\n\n\nXiu, Liming. 2019. “Time Moore: Exploiting Moore’s Law from the Perspective of Time.”\nIEEE Solid-State Circuits Mag. 11 (1): 39–55. https://doi.org/10.1109/mssc.2018.2882285.\n\n\nXu, Chen, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong\nWang, and Hongbin Zha. 2018. “Alternating Multi-Bit Quantization\nfor Recurrent Neural Networks.” In 6th International\nConference on Learning Representations, ICLR 2018, Vancouver, BC,\nCanada, April 30 - May 3, 2018, Conference Track Proceedings.\nOpenReview.net. https://openreview.net/forum?id=S19dR9x0b.\n\n\nXu, Hu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes,\nVasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, and Christoph\nFeichtenhofer. 2023. “Demystifying CLIP Data.”\nArXiv Preprint abs/2309.16671. https://arxiv.org/abs/2309.16671.\n\n\nXu, Ying, Xu Zhong, Antonio Jimeno Yepes, and Jey Han Lau. 2021.\n“Grey-Box Adversarial Attack and Defence for\nSentiment Classification.” arXiv Preprint\narXiv:2103.11576.\n\n\nXu, Zheng, Yanxiang Zhang, Galen Andrew, Christopher A Choquette-Choo,\nPeter Kairouz, H Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang.\n2023. “Federated Learning of Gboard Language Models with\nDifferential Privacy.” ArXiv Preprint abs/2305.18465. https://arxiv.org/abs/2305.18465.\n\n\nYang, Tien-Ju, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv\nMathews, and Mingqing Chen. 2023. “Online Model Compression for\nFederated Learning with Large Models.” In ICASSP 2023 - 2023\nIEEE International Conference on Acoustics, Speech and Signal Processing\n(ICASSP), 1–5. IEEE; IEEE. https://doi.org/10.1109/icassp49357.2023.10097124.\n\n\nYe, Linfeng, and Shayan Mohajer Hamidi. 2021. “Thundernna:\nA White Box Adversarial Attack.” arXiv Preprint\narXiv:2111.12305.\n\n\nYeh, Y. C. 1996. “Triple-Triple Redundant 777 Primary Flight\nComputer.” In 1996 IEEE Aerospace Applications Conference.\nProceedings, 1:293–307. IEEE; IEEE. https://doi.org/10.1109/aero.1996.495891.\n\n\nYik, Jason, Soikat Hasan Ahmed, Zergham Ahmed, Brian Anderson, Andreas\nG. Andreou, Chiara Bartolozzi, Arindam Basu, et al. 2023.\n“NeuroBench: Advancing Neuromorphic\nComputing Through Collaborative, Fair and Representative\nBenchmarking.” https://arxiv.org/abs/2304.04640.\n\n\nYou, Jie, Jae-Won Chung, and Mosharaf Chowdhury. 2023. “Zeus:\nUnderstanding and Optimizing GPU Energy\nConsumption of DNN Training.” In 20th USENIX\nSymposium on Networked Systems Design and Implementation (NSDI 23),\n119–39. Boston, MA: USENIX Association. https://www.usenix.org/conference/nsdi23/presentation/you.\n\n\nYou, Yang, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer.\n2018. “ImageNet Training in Minutes.” https://arxiv.org/abs/1709.05011.\n\n\nYoung, Tom, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018.\n“Recent Trends in Deep Learning Based Natural Language Processing\n[Review Article].” IEEE Comput. Intell.\nMag. 13 (3): 55–75. https://doi.org/10.1109/mci.2018.2840738.\n\n\nYu, Yuan, Martı́n Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy\nDavis, Jeff Dean, et al. 2018. “Dynamic Control Flow in\nLarge-Scale Machine Learning.” In Proceedings of the\nThirteenth EuroSys Conference, 265–83. ACM. https://doi.org/10.1145/3190508.3190551.\n\n\nZafrir, Ofir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019.\n“Q8BERT: Quantized 8Bit\nBERT.” In 2019 Fifth Workshop on Energy\nEfficient Machine Learning and Cognitive Computing - NeurIPS Edition\n(EMC2-NIPS), 36–39. IEEE; IEEE. https://doi.org/10.1109/emc2-nips53020.2019.00016.\n\n\nZeiler, Matthew D. 2012. “Reinforcement and Systemic Machine\nLearning for Decision Making.” Wiley. https://doi.org/10.1002/9781118266502.ch6.\n\n\nZennaro, Marco, Brian Plancher, and V Janapa Reddi. 2022.\n“TinyML: Applied AI for\nDevelopment.” In The UN 7th Multi-Stakeholder Forum on\nScience, Technology and Innovation for the Sustainable Development\nGoals, 2022–05.\n\n\nZhang, Chen, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason\nOptimizing Cong. 2015. “FPGA-Based Accelerator Design\nfor Deep Convolutional Neural Networks Proceedings of the 2015\nACM.” In SIGDA International Symposium on\nField-Programmable Gate Arrays-FPGA, 15:161–70.\n\n\nZhang, Dan, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna\nGoldie, and Azalia Mirhoseini. 2022. “A Full-Stack Search\nTechnique for Domain Optimized Deep Learning Accelerators.” In\nProceedings of the 27th ACM International Conference on\nArchitectural Support for Programming Languages and Operating\nSystems, 27–42. ASPLOS ’22. New York, NY, USA: ACM. https://doi.org/10.1145/3503222.3507767.\n\n\nZhang, Dongxia, Xiaoqing Han, and Chunyu Deng. 2018. “Review on\nthe Research and Practice of Deep Learning and Reinforcement Learning in\nSmart Grids.” CSEE Journal of Power and Energy Systems 4\n(3): 362–70. https://doi.org/10.17775/cseejpes.2018.00520.\n\n\nZhang, Hongyu. 2008. “On the Distribution of Software\nFaults.” IEEE Trans. Software Eng. 34 (2): 301–2. https://doi.org/10.1109/tse.2007.70771.\n\n\nZhang, Jeff Jun, Tianyu Gu, Kanad Basu, and Siddharth Garg. 2018.\n“Analyzing and Mitigating the Impact of Permanent Faults on a\nSystolic Array Based Neural Network Accelerator.” In 2018\nIEEE 36th VLSI Test Symposium (VTS), 1–6. IEEE; IEEE. https://doi.org/10.1109/vts.2018.8368656.\n\n\nZhang, Jeff, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018.\n“ThUnderVolt: Enabling Aggressive\nVoltage Underscaling and Timing Error Resilience for Energy Efficient\nDeep Learning Accelerators.” In 2018 55th ACM/ESDA/IEEE\nDesign Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465918.\n\n\nZhang, Li Lyna, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, and Yunxin Liu.\n2020. “Fast Hardware-Aware Neural Architecture Search.” In\n2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition\nWorkshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw50498.2020.00354.\n\n\nZhang, Qingxue, Dian Zhou, and Xuan Zeng. 2017. “Highly Wearable\nCuff-Less Blood Pressure and Heart Rate Monitoring with Single-Arm\nElectrocardiogram and Photoplethysmogram Signals.” BioMedical\nEngineering OnLine 16 (1): 23. https://doi.org/10.1186/s12938-017-0317-z.\n\n\nZhang, Tunhou, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai\nHelen Li, and Yiran Chen. 2020. “AutoShrink:\nA Topology-Aware NAS for Discovering Efficient\nNeural Architecture.” In The Thirty-Fourth AAAI Conference on\nArtificial Intelligence, AAAI 2020, the Thirty-Second Innovative\nApplications of Artificial Intelligence Conference, IAAI 2020, the Tenth\nAAAI Symposium on Educational Advances in Artificial Intelligence, EAAI\n2020, New York, NY, USA, February 7-12, 2020, 6829–36. AAAI Press.\nhttps://aaai.org/ojs/index.php/AAAI/article/view/6163.\n\n\nZhao, Mark, and G. Edward Suh. 2018. “FPGA-Based\nRemote Power Side-Channel Attacks.” In 2018 IEEE Symposium on\nSecurity and Privacy (SP), 229–44. IEEE; IEEE. https://doi.org/10.1109/sp.2018.00049.\n\n\nZhao, Yue, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas\nChandra. 2018. “Federated Learning with Non-Iid Data.”\nArXiv Preprint abs/1806.00582. https://arxiv.org/abs/1806.00582.\n\n\nZhou, Bolei, Yiyou Sun, David Bau, and Antonio Torralba. 2018.\n“Interpretable Basis Decomposition for Visual Explanation.”\nIn Proceedings of the European Conference on Computer Vision\n(ECCV), 119–34.\n\n\nZhou, Chuteng, Fernando Garcia Redondo, Julian Büchel, Irem Boybat,\nXavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian,\nManuel Le Gallo, and Paul N. Whatmough. 2021.\n“AnalogNets: Ml-hw\nCo-Design of Noise-Robust TinyML Models and Always-on\nAnalog Compute-in-Memory Accelerator.” https://arxiv.org/abs/2111.06503.\n\n\nZhou, Guanglei, and Jason H. Anderson. 2023. “Area-Driven\nFPGA Logic Synthesis Using Reinforcement Learning.”\nIn Proceedings of the 28th Asia and South Pacific Design Automation\nConference, 159–65. ACM. https://doi.org/10.1145/3566097.3567894.\n\n\nZhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018.\n“Learning Rich Features for Image Manipulation Detection.”\nIn 2018 IEEE/CVF Conference on Computer Vision and Pattern\nRecognition, 1053–61. IEEE. https://doi.org/10.1109/cvpr.2018.00116.\n\n\nZhu, Hongyu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Anand\nJayarajan, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko.\n2018. “Benchmarking and Analyzing Deep Neural Network\nTraining.” In 2018 IEEE International Symposium on Workload\nCharacterization (IISWC), 88–100. IEEE; IEEE. https://doi.org/10.1109/iiswc.2018.8573476.\n\n\nZhu, Ligeng, Lanxiang Hu, Ji Lin, Wei-Ming Chen, Wei-Chen Wang, Chuang\nGan, and Song Han. 2023. “PockEngine:\nSparse and Efficient Fine-Tuning in a Pocket.” In\n56th Annual IEEE/ACM International Symposium on\nMicroarchitecture. ACM. https://doi.org/10.1145/3613424.3614307.\n\n\nZhuang, Fuzhen, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu\nZhu, Hui Xiong, and Qing He. 2021. “A Comprehensive Survey on\nTransfer Learning.” Proc. IEEE 109 (1): 43–76. https://doi.org/10.1109/jproc.2020.3004555.\n\n\nZoph, Barret, and Quoc V. Le. 2023. “Cybernetical\nIntelligence.” Wiley. https://doi.org/10.1002/9781394217519.ch17.", "crumbs": [ "REFERENCES", "References"