在往期的集微訪談欄目中,愛集微有幸采訪了Axelera AI首席執(zhí)行官兼聯(lián)合創(chuàng)始人Fabrizio Del Maffeo。集微訪談就關(guān)于RISC-V開源技術(shù)、AI芯片發(fā)展、存內(nèi)計(jì)算、新創(chuàng)企業(yè)模式、數(shù)據(jù)訪問權(quán)限等一系列問題,收到了十分有啟發(fā)的答復(fù)。
問:我的第一 個(gè)問題是,與馮·諾依曼結(jié)構(gòu)相比,內(nèi)存計(jì)算的特點(diǎn)是什么?與傳統(tǒng)架構(gòu)相比,它有什么優(yōu)勢(shì)?
答:感謝提問。存內(nèi)計(jì)算的優(yōu)勢(shì)在于,您可以用一種特定的方式進(jìn)行并行計(jì)算,因?yàn)閺谋举|(zhì)上講,您是在轉(zhuǎn)換一個(gè)存儲(chǔ)陣列,這個(gè)陣列通常很大,可以是26萬個(gè)元件,也可以是一百萬個(gè)元件,并將其用作計(jì)算引擎。然后優(yōu)點(diǎn)就在于高度并行化,這意味著高吞吐量,低數(shù)據(jù)傳輸量。因?yàn)槟阍诖鎯?chǔ)內(nèi)進(jìn)行計(jì)算,這意味著低功耗和低成本,因?yàn)槟愫喜⒘舜鎯?chǔ)區(qū)域和計(jì)算元件。
這樣,整體區(qū)域的面積就更小,芯片成本也就更低。存內(nèi)計(jì)算分為兩種,一種是模擬存內(nèi)計(jì)算,另一種是數(shù)字存內(nèi)計(jì)算。在模擬存內(nèi)計(jì)算中,你利用電流和晶體管和存儲(chǔ)單元中電壓之間的關(guān)系,來進(jìn)行所有的神經(jīng)網(wǎng)絡(luò)中的計(jì)算和向量矩陣乘法。
這是一種方法,對(duì)吧?但是當(dāng)你在模擬域中進(jìn)行處理,你輸入一個(gè)數(shù)字信號(hào)數(shù)據(jù)。你需要先將其轉(zhuǎn)換為模擬信號(hào),進(jìn)行計(jì)算,最后再將結(jié)果轉(zhuǎn)換回?cái)?shù)字信號(hào)。模擬存內(nèi)計(jì)算問題就在于模擬域中存在噪聲。當(dāng)噪聲出現(xiàn),它就會(huì)影響計(jì)算的結(jié)果。一個(gè)典型的模擬內(nèi)存計(jì)算芯片。通常它們的精度都不高。你必須對(duì)計(jì)算網(wǎng)絡(luò)和芯片進(jìn)行微調(diào)才能得到一個(gè)不錯(cuò)的精度。在我們的AXELERA中,我們有模擬存內(nèi)計(jì)算的技術(shù),但我們不使用它。我們使用與之不同的數(shù)字存內(nèi)計(jì)算,因?yàn)檫@樣我們就不需要進(jìn)行數(shù)模和模數(shù)轉(zhuǎn)換,不需要在模擬域計(jì)算。我們只是在SRAM單元。在靠近每個(gè)單元的地方嵌入了一個(gè)元件,一個(gè)計(jì)算元件來做乘法運(yùn)算。然后我們用一個(gè)加法器樹來做累加。從而實(shí)現(xiàn)在數(shù)字領(lǐng)域進(jìn)行計(jì)算,讓我們能夠把內(nèi)存和計(jì)算集中在一個(gè)小區(qū)域內(nèi)。同時(shí)該結(jié)構(gòu)也讓我們實(shí)現(xiàn)了并行化計(jì)算。
這意味著能實(shí)現(xiàn)高吞吐量,低成本——因?yàn)樾酒×?,更少的?shù)據(jù)傳輸——這意味著低功耗,以及高精度——因?yàn)槲覀冊(cè)跀?shù)字域做計(jì)算。
問:您認(rèn)為內(nèi)存計(jì)算會(huì)擴(kuò)展到更通用的計(jì)算領(lǐng)域還是更適合專用計(jì)算?
答:對(duì),你使用存內(nèi)計(jì)算。你只用來做一件事,那就是在向量和矩陣中做乘法運(yùn)算。而如果你深入研究神經(jīng)網(wǎng)絡(luò)、遞歸神經(jīng)網(wǎng)絡(luò)、卷積神經(jīng)網(wǎng)絡(luò)、LSTM 網(wǎng)絡(luò)、Transformer網(wǎng)絡(luò),70%到90%的計(jì)算都只是向量矩陣乘法。
而你可以通過存內(nèi)計(jì)算來做所有的向量矩陣乘法。存內(nèi)計(jì)算可以完成這所有的一切。存內(nèi)計(jì)算跑不了激活函數(shù)。你不在存內(nèi)計(jì)算中做這些。存內(nèi)計(jì)算只需要做乘法和累加——即對(duì)數(shù)字求和。僅此而已。但這些計(jì)算占了神經(jīng)網(wǎng)絡(luò)計(jì)算量中的70%到90%。這就是為什么我們?cè)谌斯ぶ悄芎蜋C(jī)器學(xué)習(xí)進(jìn)行深度學(xué)習(xí)的領(lǐng)域中使用它的一個(gè)非常重要的原因。
但在其他任何領(lǐng)域,你都不會(huì)使用存內(nèi)計(jì)算,除非你就是要做向量矩陣乘法運(yùn)算。
問:您認(rèn)為存內(nèi)計(jì)算是否是突破存儲(chǔ)墻(Memory Wall)的一種解決方案?
答:存內(nèi)計(jì)算是向量矩陣乘法的解決方案,僅此而已。要打破存儲(chǔ)墻,還有其他的一些方法,比如鄰存計(jì)算,這種方法略有不同。比如你有一個(gè)更通用的計(jì)算元件,非常小,放在靠近存儲(chǔ)的地方。
這樣,你可以使用數(shù)千個(gè)小 CPU 和數(shù)千個(gè)小存儲(chǔ)而不是使用巨大的CPU和巨大的存儲(chǔ)。我認(rèn)為這是解決內(nèi)存墻的最佳方案,但這并不是真正的存內(nèi)計(jì)算,而是鄰存計(jì)算,因?yàn)閮烧叩膮^(qū)別在于,在鄰存計(jì)算中,你仍然有一個(gè)存儲(chǔ)陣列和一個(gè)計(jì)算元件。而在存內(nèi)計(jì)算中,你要分解存儲(chǔ)陣列,并在陣列中放入計(jì)算元件。存內(nèi)計(jì)算智能用于乘法和累加,沒有其他用處。
問:RISC-V這種開源技術(shù)能否成為“人工智能民主化”愿景的一部分?
答:是的??偟膩碚f,在加速器上,我們盡所能保持我們的軟件棧的開放。我們?cè)谑褂瞄_放源代碼,也在編譯器的后端使用了TVM,還在固件中使用了由英特爾支持的開源項(xiàng)目Zephyr。我們還嘗試使用oneAPI,我們正在嘗試使用盡可能多的開源軟件,同時(shí)也回饋社區(qū)。
在加速器領(lǐng)域,有很多人在RISC-V社區(qū)都非?;钴S。然后我們想回饋社區(qū)。我們想開發(fā)一些東西,創(chuàng)建我們自己的架構(gòu)和產(chǎn)品,但仍然基于開源。但我認(rèn)為,當(dāng)我提出我們想民主化人工智能時(shí),也就意味著我們想要一個(gè)功能強(qiáng)大、可用性強(qiáng)、成本低的產(chǎn)品。
例如,如果你使用我們?cè)O(shè)計(jì)的解決方案,它實(shí)現(xiàn)了超過 200TOPs的算力,我們將它定價(jià)在149美元的卡片中,因?yàn)槲覀兿M藗兪褂盟?。我們想讓人們獲得這個(gè)強(qiáng)大的解決方案。賺錢總是有時(shí)間的。但首先要讓人們能夠利用我們的技術(shù)創(chuàng)造出偉大的東西。如果他們成功了,我們也就成功了。然后我們認(rèn)為,重要的是要有一個(gè)易于使用、高性能、低成本的東西,你可以在網(wǎng)上買到它,在世界各地都能買到它。你隨處都可以做出好產(chǎn)品。我們希望激發(fā)創(chuàng)新的活力。
問:對(duì)于 AI 加速芯片來說,采用 RSIC-V 有何優(yōu)勢(shì)?
答:優(yōu)點(diǎn)是我們都可以掌控它,因?yàn)樗情_源的,所以我們可以設(shè)計(jì)它,可以控制它。我們不需要回到任何人那里去征求許可或者詢問編譯器的源代碼。無論你使用的是CAD、Synopsys還是Arm的任何IP,他們都一樣。你都無法獲得所有的權(quán)限,你就只得開始依賴他們。這可能會(huì)成為一個(gè)隱患。因此,從長遠(yuǎn)來看,有了RISC-V,你可以完全控制你的架構(gòu)。同時(shí)很好的一點(diǎn)是它是一個(gè)經(jīng)過大型社區(qū)測(cè)試的平臺(tái)。而且你還可以對(duì)它進(jìn)行擴(kuò)展和開發(fā)。比如,我們正在開發(fā)一種特定的向量指令集單元,這將集成在下一代產(chǎn)品中。我們可以自己做,因?yàn)槲覀冇兄R(shí)。而且它是一個(gè)開源的平臺(tái),這樣我們就不必與供應(yīng)商協(xié)商解決問題了。
問:您覺得 AI 應(yīng)用會(huì)成為 RISC-V 生態(tài)的重要推動(dòng)力么?
答:我認(rèn)為在針對(duì)特定應(yīng)用設(shè)定的芯片中使用RISC-V比在通用芯片中更容易。因?yàn)樵谔囟☉?yīng)用芯片中,你可以使用RISC-V,并針對(duì)你想做的事情優(yōu)化 RISC-V。
然后,你必須對(duì)它進(jìn)行驗(yàn)證,以滿足你的要求。但是如果你想把RISC-V用作通用處理器,你想用它來和英特爾或者AMD最先進(jìn)的CPU競(jìng)爭(zhēng),那就得另當(dāng)別論了。用RISC-V實(shí)現(xiàn)要困難得多,需要的資源和時(shí)間也更多。因?yàn)檫@是一個(gè)新的架構(gòu),它還沒有得到所有人的高度認(rèn)可。從某種意義上來講,當(dāng)達(dá)到芯片如此復(fù)雜的階段,你需要一整個(gè)生態(tài)系統(tǒng)的支持,你需要驅(qū)動(dòng)程序,你需要來自社區(qū)、微軟、Ubuntu、Linux 的支持。總的來說,這將變得更加困難。因此,我認(rèn)為目前得益于AI, RISC-V 將會(huì)發(fā)展壯大,。但要想成為真正的通用解決方案,來替代現(xiàn)在的產(chǎn)品,還需要 5 到 10 年的時(shí)間。我們還需要給RISC-V一些時(shí)間才會(huì)看到它在諸如手機(jī)上應(yīng)用。
問:就計(jì)算效率而言,也許數(shù)據(jù)中心擁有更好的基礎(chǔ)設(shè)施,因?yàn)樗麄冇懈玫幕A(chǔ)設(shè)施。他們擁有更穩(wěn)定的計(jì)算能力。為什么我們需要邊緣人工智能?
答:正如你所說,你不需要邊緣人工智能來提高效率,您提的觀點(diǎn)是正確的。數(shù)據(jù)中心的效率比邊緣計(jì)算好太多了,因?yàn)槟惆阉袞|西都集中起來了,尤其是利用率,比效率優(yōu)勢(shì)來說,數(shù)據(jù)中心的利用率還要更高,對(duì)吧?但你需要邊緣人工智能,因?yàn)殡[私、數(shù)據(jù)安全、安全、金融。想想看,你不會(huì)容許你的汽車向云端詢問,我應(yīng)該向右轉(zhuǎn)還是向左轉(zhuǎn)吧?你的汽車需要有足夠的計(jì)算能力,能夠在沒有延遲的情況下及時(shí)做出反應(yīng),幾乎不需要向云端查詢就能對(duì)發(fā)生的任何事情做出反應(yīng)。甚至因?yàn)樵谀承╊I(lǐng)域,云計(jì)算還沒有覆蓋。
其次,從經(jīng)濟(jì)學(xué)角度來講,把所有數(shù)據(jù)都發(fā)送到云端也是沒有意義的。想想監(jiān)控系統(tǒng),復(fù)雜的監(jiān)控系統(tǒng),有大量的高分辨率的攝像頭。想把所有這些數(shù)據(jù)都發(fā)送到云端的成本是極高的,因?yàn)槠渲杏?5%或者98%的數(shù)據(jù)都是無用的。因?yàn)槟阆胍私饣蚰阆胍R(shí)別的東西才是有用的,隨便舉個(gè)例子,比如某人在火車站里丟了行李。后者警察正在尋找某位正在移動(dòng)的逃犯。對(duì)于這些你不必知道的事,為什么你要把所有的數(shù)據(jù)都發(fā)送到云端?你可以在邊緣提取正確的信息,這樣做的成本會(huì)更低。而且,其實(shí)在許多地區(qū)云計(jì)算還沒覆蓋。事實(shí)上,可能根本沒有一個(gè)良好的網(wǎng)絡(luò)連接。這就存在一個(gè)基礎(chǔ)設(shè)施的問題,你無法解決整個(gè)體系的問題,將數(shù)據(jù)發(fā)送到云端。那么邊緣計(jì)算就有意義了。它對(duì)許多不同的應(yīng)用都是必要的,無人機(jī)、機(jī)器人、汽車……甚至監(jiān)控系統(tǒng)。
問:對(duì)于邊緣人工智能的解決方案,我們還需要克服什么問題?你已經(jīng)提到了能源消耗,可能平臺(tái)可能沒有那么充足的能量。還有運(yùn)行條件、光線、延遲、成本或者維護(hù)能力,這些呢?
答:是的,我認(rèn)為對(duì)我來說,障礙是不一樣的。在云計(jì)算領(lǐng)域,參與的玩家不多,比如中國也就兩三四家云計(jì)算提供商,美國和歐洲也是如此。就提供商而言,中國和美國在云計(jì)算領(lǐng)域處于領(lǐng)先地位。有幾家公司正在建立云計(jì)算數(shù)據(jù)中心并提供服務(wù)。
因此,設(shè)計(jì)一項(xiàng)技術(shù)并提供給他們是很容易的,因?yàn)槟阒恍杳鎸?duì)一個(gè)大客戶,他們只用同一套他們需要的功能清單,等等。但當(dāng)涉及到邊緣人工智能的時(shí)候,你有一千到幾千個(gè)客戶,每個(gè)客戶都有不同的要求。而且許多客戶都不具備了解你的技術(shù)的背景,也無法按照他們的需求來調(diào)校。
那么,你需要解決邊緣計(jì)算的問題就不同了。如果你想讓邊緣計(jì)算取得成功,你就需要有明顯的性價(jià)比高的硬件,因?yàn)槟阈枰杀拘б娓叩慕鉀Q方案,因?yàn)樵诔杀痉矫孢吘売?jì)算客戶比云端客戶更敏感。你需要提高能效,因?yàn)槟阌邢拗?。?shù)據(jù)中心沒有能源限制,通常它旁邊就是電廠。而在邊緣計(jì)算場(chǎng)景下,你會(huì)遇到一些限制。因此,你要注重能效。此外還要注重易用性。你需要提供即插即用的產(chǎn)品。客戶,有90%邊緣計(jì)算的客戶,他們不可能擁有百度那樣的工程師。公司的情況是不一樣的,對(duì)吧?因?yàn)樗麄兌际侵行⌒凸尽?/span>
所以你需要給他們提供所有的軟件棧和工具。讓他們能夠以容易簡單的方式高效地使用你的解決方案。這就是為什么要注重易用性。如今,例如,在邊緣領(lǐng)域,雖然英偉達(dá)的AI硬件產(chǎn)品很強(qiáng)大,比如性能以及平臺(tái),但它太貴了,限制了它的普及。你總不能想著在一個(gè)準(zhǔn)備賣 500 美元的機(jī)器人里面塞一顆價(jià)值 1000 美元的芯片對(duì)吧?這肯定不現(xiàn)實(shí)。
我發(fā)現(xiàn)有一些解決方案很好,但很貴,也有一些解決方案很便宜,但很難使用。找到一個(gè)好的折中方案很重要。
問:您能告訴我們更多關(guān)于維持能力的事嗎?還有讓Edge AI芯片保持Edge AI解決方案的易用性有多重要?
答:我可以告訴你,首先,客戶他們用云端做所有的事情,甚至是訓(xùn)練算法。如果你是一家中小型企業(yè),你想在人工智能領(lǐng)域有所作為,你必須連接到亞馬遜或者百度之類的。不管選擇哪家的服務(wù),總之你都必須回到云端系統(tǒng),使用云端中的典型工具。那么怎么把AI從云端拿出來?是網(wǎng)絡(luò),是訓(xùn)練網(wǎng)絡(luò)應(yīng)用程序。
那么問題就是如何在邊緣使用這些功能。我們Axelera AI,我們必須為客戶提供一個(gè)簡單的軟件堆棧,這讓他們?cè)谠贫俗龅氖虑樵谶吘壱部梢赃\(yùn)行。你必須確定,不知道什么是量化,云計(jì)算的客戶可能知道這是啥,但是在邊緣計(jì)算,90%、95%的客戶不知道也不在乎單精度浮點(diǎn)型和整型之間有什么區(qū)別。所以這個(gè)坑只能我們來填。讓他們無論在云端做了什么。我們必須在邊緣側(cè)為他們提供使用相同應(yīng)用或相同網(wǎng)絡(luò)的工具。然后,邊緣服務(wù)提供商需要建立一個(gè)更柔性的堆棧,允許客戶使用他們現(xiàn)在正在使用的東西,但要部署在邊緣。我們應(yīng)該負(fù)責(zé)部署落地,而不是造新輪子,因?yàn)榭蛻羲麄儾幌雽W(xué)習(xí)新的東西。
如果你去找客戶說,聽著,我有很好的硬件,但是你必須學(xué)習(xí)我的軟件。他們會(huì)說,不,我沒有時(shí)間整這些,我不需要,憑啥要我從頭學(xué)起?你必須去找他們說,聽著,我有一個(gè)很棒的硬件和軟件堆棧,你要做的就是繼續(xù)用你習(xí)慣的方案,按下按鈕,然后就能直接用了?;蛘呃^續(xù)用你的方案,然后只要做這幾步,就能直接用了。部署落地應(yīng)該做得非常簡單。這也是很多公司我認(rèn)為他們沒有考慮到的關(guān)鍵問題。他們認(rèn)為效率更重要。誠然,效率是很重要,但效率并不是全部。你需要把:效率、吞吐量、成本和軟件堆棧這幾方面放一起考慮。甚至還有客戶關(guān)心的是總體擁有成本。如果你對(duì)客戶說,聽著,用我的產(chǎn)品,你每年可以節(jié)省30萬歐元,但客戶更換軟件需要花費(fèi)100萬歐元,那么很容易預(yù)想,他們就不會(huì)選擇購買。然后你必須從全局的高度來思考其中隱含的意義。
問:邊緣 AI 所面對(duì)的各種工況是否意味著,其采用的芯片類型和數(shù)據(jù)中心所使用的芯片的側(cè)重點(diǎn)各不相同?邊緣 AI 可能更傾向于專芯專用?
答:如果涉及到消費(fèi)者邊緣設(shè)備,它們是高度定制化的。例如,在電視中,電視算邊緣設(shè)備。它使用專有的解決方案,并且具有很多由人工智能驅(qū)動(dòng)的功能,因此它的SoC必須極其定制化。其必須以非常特定的方式進(jìn)行設(shè)計(jì),它必須具有低功耗特性,因?yàn)殡娨暠仨毐3值凸?。你不能塞個(gè)風(fēng)扇或者計(jì)算機(jī)進(jìn)去。因此,電視是高度定制化的。手機(jī)也是如此。手機(jī)上的話定制程度就更高了,因?yàn)槭謾C(jī)是用電池驅(qū)動(dòng)的,那么可能不會(huì)運(yùn)行浮點(diǎn)網(wǎng)絡(luò),而是運(yùn)行二進(jìn)制網(wǎng)絡(luò),因?yàn)閷?duì)于大多數(shù)用戶來說二進(jìn)制網(wǎng)絡(luò)已經(jīng)足夠滿足需求,他們對(duì)此不太敏感。當(dāng)涉及到自動(dòng)化時(shí),你必須找到一個(gè)合理的折衷方案,因?yàn)橛袝r(shí)仍然存在功耗限制,但你沒法妥協(xié)說,我使用二進(jìn)制計(jì)算的網(wǎng)絡(luò),因?yàn)槭褂米詣?dòng)化和網(wǎng)絡(luò)時(shí),需要依賴高精確度的結(jié)果。
你需要在效率、吞吐量和準(zhǔn)確性之間找到一個(gè)良好的平衡。盡管邊緣設(shè)備有一些限制,但仍需要努力達(dá)到云計(jì)算級(jí)別的精確性,因此邊緣設(shè)備也是定制的,但是多樣化的,它是更可編程的解決方案。
而當(dāng)你轉(zhuǎn)向云端時(shí),正如你所說,在云端什么都有。在云端,你會(huì)發(fā)現(xiàn),專業(yè)化程度越來越高。不同之處在于,在數(shù)據(jù)中心中,你開始擁有越來越多針對(duì)特定工作負(fù)載設(shè)計(jì)的專用機(jī)器。雖然云端的效率要求不像邊緣設(shè)備那樣嚴(yán)格,但仍然是必要的。在邊緣設(shè)備上,你嘗試實(shí)現(xiàn)每瓦15Tops、20Tops或30Tops等高性能。而在今天的云端,工作負(fù)載的運(yùn)行效率通常只有每瓦0.1Tops甚至更低。因?yàn)橥ㄓ糜?jì)算平臺(tái)的效率很低。此外,即使在數(shù)據(jù)中心中,你也會(huì)看到越來越多地使用專用硬件,如張量處理單元(TPU)、GPU和CPU、ASIC。這是一個(gè)趨勢(shì),并且基于工作負(fù)載的不同,它們會(huì)將任務(wù)分配給不同的硬件。在數(shù)據(jù)中心中,我也看到了這樣的趨勢(shì)。
問:邊緣 AI 碎片化的產(chǎn)品需求是否意味著更不容易被大公司壟斷,小公司會(huì)有更多機(jī)會(huì)?答:是的,完全正確。從傳統(tǒng)來看確實(shí)是這樣的。如果我們考慮云計(jì)算,在過去的20到30年中,一直都有英特爾、AMD,最近還加入了英偉達(dá),然后實(shí)際上有2到3家公司占據(jù)了云計(jì)算市場(chǎng)的98%,而新興公司或其他公司只占很小一部分份額。
但是,當(dāng)我們轉(zhuǎn)向邊緣計(jì)算領(lǐng)域,歷史上一直存在著眾多的參與者,比如英特爾、AMD、英偉達(dá)、高通、恩智浦、德州儀器、瑞薩電子、意法半導(dǎo)體、英飛凌、聯(lián)發(fā)科技、Cirrus Logic、Umbrella Silicone等公司。邊緣計(jì)算市場(chǎng)更加專用化,有非常多不同的應(yīng)用領(lǐng)域。這導(dǎo)致了市場(chǎng)的碎片化,而大型企業(yè)并不喜歡這種情況。顧客往往需要特定的應(yīng)用處理器來滿足他們?cè)谶吘壴O(shè)備上的需求。這就是為什么邊緣計(jì)算市場(chǎng)存在更多的空間容納更多參與者。我預(yù)計(jì)在邊緣計(jì)算領(lǐng)域也會(huì)出現(xiàn)整合,但不像云計(jì)算那樣,我預(yù)計(jì)在邊緣計(jì)算領(lǐng)域會(huì)看到的公司會(huì)多得多,雖然邊緣計(jì)算的半導(dǎo)體公司體量較小,但是數(shù)量更多,而在云計(jì)算領(lǐng)域的玩家則相對(duì)較少。
問:您是如何看待 CUDA 的?
答:這是英偉達(dá)的成功。我的意思是,英偉達(dá)之所以取得今天的成功,要?dú)w功于CUDA,該技術(shù)在2003、2004年左右開發(fā)出來的,如果沒記錯(cuò)的話。起初,人們對(duì)英偉達(dá)持懷疑態(tài)度,因?yàn)樗麄冮_發(fā)CUDA是為了科學(xué)研究和并行計(jì)算等領(lǐng)域,而不是專門為了人工智能。然而,它逐漸成為了該領(lǐng)域的參考標(biāo)準(zhǔn)。生態(tài)系統(tǒng)也是非常重要的。你能會(huì)看到英偉達(dá)的CUDA,但你也能看到諸如Pytorch和TensorFlow等開源平臺(tái)。
然后,有一些工具被廣泛使用。因此,我認(rèn)為公司應(yīng)該始終將自己的架構(gòu)整合到生態(tài)系統(tǒng)中。對(duì)于我們來說,我們不能進(jìn)入英偉達(dá)內(nèi)部,因?yàn)槲覀兪歉?jìng)爭(zhēng)對(duì)手,但是我們需要找到一種方法,讓英偉達(dá)的客戶能夠輕松地使用我們的硬件。所以我們需要將我們的架構(gòu)插入到所有部分的后端,因?yàn)樯鷳B(tài)系統(tǒng)就是一切。我們做的大概是我們把芯片比作汽車的發(fā)動(dòng)機(jī),而汽車本身就是系統(tǒng)或者主板,軟件則是駕駛員,在這個(gè)情況下,數(shù)據(jù)就是驅(qū)動(dòng)汽車的燃料。因此,你必須始終全面考慮這個(gè)系統(tǒng)。
如果你設(shè)計(jì)一個(gè)引擎,你必須知道將引擎安裝在哪款車上,你必須知道誰是駕駛員,你必須知道車內(nèi)使用的是什么燃料。當(dāng)你設(shè)計(jì)某樣具體的產(chǎn)品時(shí),你必須始終考慮整體情況。否則設(shè)計(jì)出來的東西就沒法用,如果你設(shè)計(jì)了錯(cuò)誤的引擎,試圖將其安裝在錯(cuò)誤的車輛上,顯然是行不通的。對(duì)吧?所以你必須始終思考生態(tài)系統(tǒng)層面的問題。你不能認(rèn)為每輛車配備了幾千名駕駛員,因?yàn)槲业男酒瑢⒛軌蜻\(yùn)行數(shù)千種軟件,而且會(huì)與不同類型的數(shù)據(jù)一起使用,比如用于圖像卷積神經(jīng)網(wǎng)絡(luò)或者用于LSTM(長短期記憶網(wǎng)絡(luò))的樣本音頻數(shù)據(jù)。因此,你必須從一開始就牢記這一點(diǎn)。
問:您認(rèn)為在未來邊緣 AI 和數(shù)據(jù)中心的集中 AI 會(huì)相互融合相互配合?我們現(xiàn)在處于什么狀態(tài)?
答:我們將始終保持云端與邊緣的整合。如今已經(jīng)實(shí)現(xiàn)了整合,因?yàn)槲覀兪冀K會(huì)在云端訓(xùn)練網(wǎng)絡(luò)。我的意思是,沒有必要在邊緣設(shè)備上訓(xùn)練網(wǎng)絡(luò),因?yàn)榫W(wǎng)絡(luò)的訓(xùn)練需要在有限的時(shí)間內(nèi)用大量的計(jì)算資源,因?yàn)闀r(shí)間有限。因此,最高效的方式始終是使用云端計(jì)算。
然后,在邊緣設(shè)備上進(jìn)行微調(diào),因?yàn)檫吘壓驮贫说木W(wǎng)絡(luò)模型已經(jīng)有非常強(qiáng)大的關(guān)聯(lián)。你在云端進(jìn)行訓(xùn)練,然后在邊緣設(shè)備上進(jìn)行微調(diào),然后只把需要的相關(guān)數(shù)據(jù)發(fā)送回云端以進(jìn)行更新,優(yōu)化網(wǎng)絡(luò)(模型)?,F(xiàn)在邊緣和云端模型已經(jīng)關(guān)聯(lián)了。而且這種關(guān)聯(lián)將始終存在,因?yàn)榭倳?huì)有一些工作負(fù)載,沒道理在邊緣設(shè)備上運(yùn)行。比如取決于每天的時(shí)間點(diǎn)的工作流。例如峰值負(fù)載,邊緣設(shè)備很難處理峰值負(fù)載,因?yàn)橛?jì)算能力無法瞬間擴(kuò)展,你無法持續(xù)擴(kuò)展計(jì)算能力。如果我現(xiàn)在同時(shí)啟動(dòng)20個(gè)不同的應(yīng)用程序,系統(tǒng)會(huì)崩潰,因?yàn)樗鼰o法處理那么多負(fù)載。
如果我決定連接到云端同時(shí)運(yùn)行20個(gè)應(yīng)用程序,這根本不算問題,因?yàn)樵贫丝梢詾槲曳峙渥銐虻挠?jì)算資源。當(dāng)需要?jiǎng)討B(tài)分配工作負(fù)載時(shí),云端是最佳解決方案。不論是奧斯汀的網(wǎng)站、研究還是其他應(yīng)用程序,你需要在云端運(yùn)行。而我的期望是擁有更多的分散式計(jì)算,從邊緣提取盡可能多的信息,并只將相關(guān)數(shù)據(jù)發(fā)送到云端進(jìn)行。邊緣計(jì)算就像是過濾器一樣,我們同時(shí)產(chǎn)生了很多垃圾和信息,而邊緣設(shè)備應(yīng)該過濾垃圾并將真正的信息發(fā)送到云端。我們應(yīng)該只是過濾,邊緣應(yīng)該作為第一個(gè)過濾器。最終,云端應(yīng)該生成并將資源分發(fā)回邊緣設(shè)備。那么,云端和邊緣端計(jì)算缺一不可。
問:實(shí)現(xiàn)如上愿景共有哪些難點(diǎn)(例如隱私保護(hù),協(xié)議的互通等)?
答:當(dāng)今集成的主要問題是云端的資源有限,而邊緣端自身就有限制——這是主要的矛盾:你不能簡單地將云端應(yīng)用程序直接轉(zhuǎn)移到邊緣設(shè)備上,而是需要進(jìn)行適應(yīng)性調(diào)整。這是邊緣與云端集成的主要挑戰(zhàn)。你提到了一個(gè)重要觀點(diǎn),即安全和隱私問題,特別是在歐洲這樣的地區(qū)。我們?cè)跉W洲非常重視這一點(diǎn)。我們有《通用數(shù)據(jù)保護(hù)規(guī)則》(GDPR),并且非常擔(dān)心過多地共享數(shù)據(jù)。因此,在某些應(yīng)用中使用云端出現(xiàn)了越來越多的問題,我們需要邊緣計(jì)算來解決隱私和安全問題。
但這并不是一件容易解決的事情,它涉及實(shí)施適當(dāng)?shù)能浖鉀Q方案。此外,我認(rèn)為真正的主要問題在于前面所說的很多在云端運(yùn)行的東西無法在邊緣設(shè)備上運(yùn)行,因?yàn)楸仨氝M(jìn)行重新編程、重新訓(xùn)練、重新適應(yīng)等操作,原因是硬件不同,對(duì)吧?
問:您認(rèn)為克服這樣的問題,目前有什么可能的方法?(例如聯(lián)邦學(xué)習(xí),制定行業(yè)規(guī)范等)答:問題通常在于缺乏標(biāo)準(zhǔn)。沒有標(biāo)準(zhǔn)的協(xié)議。你提到聯(lián)邦學(xué)習(xí)是個(gè)好東西,但每個(gè)參與者都有自己的實(shí)現(xiàn)方式,并受限于他們使用的硬件條件。因此,你需要標(biāo)準(zhǔn)化來規(guī)范設(shè)備,你需要將設(shè)備標(biāo)準(zhǔn)化才能使不同設(shè)備能夠相互連接。如今,大公司不喜歡標(biāo)準(zhǔn)化,希望保持封閉生態(tài)系統(tǒng)以確保盈利,因此許多玩家就沒有興趣開放源代碼或開放其他設(shè)備的連接。這就是問題所在,也就是連接的問題。聯(lián)邦學(xué)習(xí)是個(gè)好東西,但目前沒有真正的標(biāo)準(zhǔn),仍處于暫定狀態(tài)。雖然安全方面已經(jīng)有一些解決方案,現(xiàn)在有些公司正嘗試采用同態(tài)加密來解決數(shù)據(jù)隱私和安全問題。要解決這些重大問題,需要讓所有參與者坐到一起,就標(biāo)準(zhǔn)化事項(xiàng)達(dá)成一致。就像我們都同意使用PCIe、USB這些連接設(shè)備的總線一樣,我們還應(yīng)該在協(xié)議級(jí)別和應(yīng)用層面上達(dá)成共識(shí)。但是人們并不喜歡這樣做,因?yàn)樗麄兏朐趹?yīng)用層競(jìng)爭(zhēng),沒人關(guān)心在PCIe、USB的低層級(jí)上競(jìng)爭(zhēng)。但對(duì)于大公司來說,在高層級(jí)的競(jìng)爭(zhēng)非常重要,這使得情況變得復(fù)雜,非常困難。我認(rèn)為政府可以發(fā)揮作用,通過強(qiáng)制規(guī)定來規(guī)范,比如說某種方式應(yīng)該是這樣的,否則不允許使用。這可能是實(shí)現(xiàn)目標(biāo)的唯一方法,如果我們真的想讓聯(lián)邦學(xué)習(xí)取得成功的話。
問:Axelera AI 總部位于埃因霍溫,這是一座高科技實(shí)力雄厚的小城市。 埃因霍溫憑借怎樣的煉金術(shù)打造出如此偉大的半導(dǎo)體產(chǎn)業(yè)集群?
答:埃因霍溫因飛利浦而存在,而飛利浦這個(gè)品牌來自埃因霍溫,而飛利浦先生也來自這個(gè)城市。他對(duì)這座城市做出了巨大貢獻(xiàn)。如果回顧歷史,你會(huì)發(fā)現(xiàn)飛利浦實(shí)際上是TSMC的合作創(chuàng)始公司之一。臺(tái)積電是由飛利浦半導(dǎo)體和臺(tái)灣省政府共同創(chuàng)辦的。
而飛利浦將知識(shí)產(chǎn)權(quán)交給了TSMC,并將光刻部門分拆出來,成立了ASML,這是全球最重要的光刻機(jī)制造公司。飛利浦還將半導(dǎo)體部門分拆出去,如今被稱為恩智浦。恩智浦就是飛利浦的半導(dǎo)體部門。后來,飛利浦收購了Freescale,如今成為一家大型企業(yè),也是有飛利浦的貢獻(xiàn)。此外,飛利浦還在醫(yī)療保健領(lǐng)域擔(dān)當(dāng)重要角色,是全球最大的醫(yī)療保健公司之一。在埃因霍溫,一切都圍繞著飛利浦展開。ASML、NXP和飛利浦構(gòu)建了極其強(qiáng)大的生態(tài)系統(tǒng)。埃因霍溫可謂“風(fēng)水寶地”,隔壁比利時(shí)魯汶就是校際微電子中心(imec),是推動(dòng)納米技術(shù)發(fā)展的巨型中心。
因此,埃因霍溫地區(qū)絕對(duì)是理想之地。在Axelera AI,正如你所知,我們擁有來自英特爾的專業(yè)人士,他們來自埃因霍溫,還有來自蘇黎世聯(lián)邦理工學(xué)院(ETH Zurich)的人員。我們?cè)谌鹗吭O(shè)有一個(gè)大型辦公室,我們還有來自IBM蘇黎世實(shí)驗(yàn)室的人員。此外,我們還有來自五個(gè)制造商人才組成的存內(nèi)計(jì)算團(tuán)隊(duì)。我們Axelera AI在歐洲各地?fù)碛袉T工。目前,我們有140名員工,其中超過50名擁有博士學(xué)位,但他們分布在歐洲不同地區(qū)。我們致力于招聘在歐洲找到的最優(yōu)秀的人才。
問:您認(rèn)為對(duì)于半導(dǎo)體初創(chuàng)公司來說,最需要什么樣的支持(上下游供應(yīng)鏈?資金支持?人才生態(tài)等等)
答:對(duì)于我們這樣的公司來說,我認(rèn)為雇傭人才并不困難,因?yàn)槲覀儚氖潞芸岬墓ぷ?。我們?cè)跉W洲開發(fā)尖端的人工智能芯片。在歐洲,從事我們所做的工作的公司并不多,因?yàn)闅W洲的大型企業(yè)從事其他領(lǐng)域的業(yè)務(wù)。比如恩智浦、意法半導(dǎo)體、英飛凌和博世,它們?cè)谄嚭凸I(yè)領(lǐng)域非常強(qiáng)大,但在人工智能方面并不那么強(qiáng)大。我們給人們提供了加入一家建立人工智能的公司的機(jī)會(huì),他們非常樂意加入我們。我們擁有非常有才華的員工,因?yàn)樗麄兿矚g我們所從事的工作。對(duì)我們這樣的創(chuàng)業(yè)公司來說,最困難的是融資,因?yàn)槲覀兗炔辉诿绹膊辉谥袊?,在歐洲為硬件項(xiàng)目獲得資金是困難的。對(duì)于軟件來說很容易獲得資金,但對(duì)于其他方面來說卻非常困難,因?yàn)槿藗兂謶岩蓱B(tài)度。在過去的幾十年里,歐洲對(duì)硬件領(lǐng)域沒有進(jìn)行太多投資。
而現(xiàn)在,我們有了《歐洲芯片法案》,由于政府認(rèn)識(shí)到硬件的重要性,情況有所改善,但仍然極為困難——這是一點(diǎn)。另外獲取數(shù)據(jù)非常困難,由于隱私問題等原因,小公司很難獲取數(shù)據(jù),沒有地方獲取數(shù)據(jù),沒有可以訪問的數(shù)據(jù)庫,現(xiàn)在對(duì)我們來說沒啥問題,因?yàn)槲覀冎皇窃O(shè)計(jì)芯片的。但從長遠(yuǎn)來看,你應(yīng)該擁有數(shù)據(jù)、獲取數(shù)據(jù)、來訓(xùn)練網(wǎng)絡(luò)、學(xué)習(xí)等等。目前,只有大公司才能獲取廣泛的數(shù)據(jù)集。第三個(gè)挑戰(zhàn)是政治局勢(shì)。美中緊張關(guān)系對(duì)歐洲也產(chǎn)生了影響,盡管歐洲與中國歷史上一直保持著良好的關(guān)系。然而,這種政治緊張局勢(shì)會(huì)引發(fā)不確定性和擔(dān)憂,即使對(duì)于像我們這樣的公司也是如此。因?yàn)槿藗兒ε逻@種不確定性,這是一個(gè)問題,因?yàn)檎谓o公司的擴(kuò)張和建立關(guān)系帶來了風(fēng)險(xiǎn)和限制。我認(rèn)為這是一個(gè)非常嚴(yán)重的問題,需要得到解決,因?yàn)樗鼘?duì)每個(gè)人和經(jīng)濟(jì)增長都有著負(fù)面影響。
我理解您的觀點(diǎn),因?yàn)閷?duì)我來說,這個(gè)法律有些嚴(yán)格。我的意思是,如果我創(chuàng)造了一些東西并分享出來,你使用了它,我應(yīng)該得到一些回報(bào)。就像在歐洲網(wǎng)絡(luò)新聞一樣,最終歐盟對(duì)網(wǎng)站進(jìn)行了處罰,并告訴包括Facebook在內(nèi)的大型企業(yè),如果你想使用新聞,你必須付費(fèi)。一方面,我們應(yīng)該找出一種機(jī)制并支付報(bào)酬。
另一方面,我們應(yīng)該獲得數(shù)據(jù)的訪問權(quán)限。我認(rèn)為我們應(yīng)該尋求一種折中的解決方案。我理解日本采取了極端的做法,但這是唯一的方式,如果沒有數(shù)據(jù),就無法設(shè)計(jì)人工智能。我是指人們因?yàn)閿?shù)據(jù)而抱怨,在中國,大公司可以輕松獲取數(shù)據(jù)并進(jìn)行訓(xùn)練。在美國情況非常相似,在歐洲由于GDPR等規(guī)定,情況更加困難,非常危險(xiǎn)。我們必須找到一種方式來獲取數(shù)據(jù),就算是加密數(shù)據(jù)也可以,因?yàn)槲覀儾辉谝鈹?shù)據(jù)內(nèi)容是什么。我的意思是,你可以對(duì)數(shù)據(jù)進(jìn)行加密,刪除敏感信息,然后提供剩下的信息。如果這是有版權(quán)的數(shù)據(jù),我認(rèn)為完全支持征稅或直接支付費(fèi)用給原始數(shù)據(jù)提供方。我完全贊同,但我們必須獲得數(shù)據(jù)。
以下是采訪原文(英文):
Q:what are the characteristics of the in-memory computing compared to the von Neumann architectures? And what advantages does it have over the traditional architectures?
A:Yes. Thanks for asking. In-memory computing as the advantage that you can parallelize computations in a unique way, because essentially, you are transforming a memory array, which is typically large, can be 260,000 elements or a million elements and use this as a computational engine.
Then the advantage is that you have high parallelization, which means high throughput, low data movement, because you compute the calculation in the memory, which means low power consumption and low cost, because you merged the memory area with the computing element.
And then the area is smaller and means low cost for the chip. And there are two kinds of in-memory computing. One is analog in-memory computing. The other is digital in-memory computing. In analog in-memory computing, you use the relationship between the current and the tension that you have in transistors, that you have in the memory cell, to do the computations, to do the vector matrix multiplication, which you have in all neural network.
And this is one way, right? But when you do in analog domain, you means that you have a data coming in digital data. You convert in analog. You do the computation. Then you convert it back in digital. The problem of analog in-memory computing is that there is noise in the analog domain. And then you have noise, and the noise changes the result of the calculations. Then a typically analog in-memory computing chips. They don't have high accuracy and high positions. You have to fine tune the network, fine tune the silicon to get back a decent accuracy. In we accelerate, we have this technology, but we don't use this. We use digital in-memory computing, which is different, because what we do,We don't convert. We don't do calculation in analog. We just take the estram sale. And close to us is each cells. We embedded an element, a computing element to do the multiplication. And then we have an adder tree that make the accumulations. This allow us to make calculations in the digital domain, allow us to put together the memory and the computation in a small area. This allows us also to parallelize the computation.
And then we will have a very high throughput, low cost, because it's the cheapest, small low data movement, which means low power consumptions and high precision because we stay in digital.
Q:Is in-memory computing more suitable for special-purpose computing for specific algorithms rather than general-purpose computing?
A:Definitely, you use in-memory computing. You use only to do one thing, multiplication between vector and matrix. And if you look inside the neural networks, recursive neural networks, convolutional neural networks, classic networks, transformer networks, 70 to 90 % of the calculations are just vector matrix multiplication.
And you do in-memory code you use in-memory computing to do all it is. In-memory computing can do all of this. You cannot do activation functions. You don't do this within memory computing. You just do the multiplication and the accumulations, the sums when you have to sum up the numbers. That's it. But this these calculations represent 70% to 90% of what you have in any neural network. And this is the reason why it's important to use it in AI and machine learning in deep learning.
But you don't use in-memory computing in any other domain. Because unless you have to do vector matrix multiplication.
Q:Do you think in-memory computing is a solution to break through the memory wall?
A:In-memory computing is the solution for the vector matrix multiplication, not more than this.
To break the memory wall, there are other approaches, which is near-memory computing, which is slightly different, where you have a more generic computing element, very small, and you put the memory close by.
Then instead of having a larger CPU and a larger memory, you have thousands small CPU with thousands small memories close by. I think this is the best solution to solve the memory wall, but it's not really in-memory computing, but there is near-memory computing. Because the difference is that in near-memory computing, you still have an array of memory and a computing element. While in in-memory computing, you break down the array of memory, and you put inside the array of the computing elements. You can do it all if you do multiplication accumulation. Otherwise, it's useless.
Q:Can RISC-V be part of vision of“democratization of Artificial Intelligence”?
A:Yes, it is.RISC-V is one element. In general, in accelerate, we try to keep open as much as we can, our software stack. We are using open source code. We are using TVM in the back end of the compiler, we are using the fair in the firmware, which is an open source for supported by Intel. We tried to use also one API, and we are trying to use as much as possible open source, and also to give back the community.
In accelerate, I most of the other many other guys are very active in the RISC-V communities. And then we want to give back the community. We want to develop things, create our own architecture and our own product. But still, based on open sources. But I think that when I say that we want to democratize the AI, it's also mean that we want to have a product which is powerful, usable, and low cost.
For example, if you take our solution that we design, which is a cheap of more than 200 tops, we are positioning this in already in a card at $149, because we want people to use it. We want to give the access to a powerful solution to people. There is always time to make money. But the first things is to have people to create great things using our technology. If they succeed, we succeed. Then we think that it's important to have something that's easy to be used, high performance, low cost that you can buy online, that you can get it everywhere in the world. You can just do great product around. We want to unleash innovation.
Q:As for AI accelerators, what are the advantages of using RSIC-V?
A:Well, it's the advantages that we can control it, because it's open source, we can design, we can control it. We don't have to go back to anyone and ask permission to or ask a source code of the compiler. If you use whatever IP from CAD and Synopsis, doesn't matter. You cannot access to everything you start to rely on them to. And this is can be a problem. In the long run, therefore, with RISC-V, you can just control completely your architecture. And it's a platform which is tested by a large community, which is good. And you can extend and develop it. For example, we are developing a vector instruction, a specific veterans instruction set units, which will be integrated in next generation. And we can do it by ourselves because we have the knowledge. And it's an open source platform, then we don't have to negotiate with supplier to solve the problem.
Q:Do you think AI applications will become an important driving force for the RISC-V ecosystem?
A:I think It's easier to use RISC-V in an application specific shape than in a general purpose. Because in application specific chip, you can use the RISC-V and optimize the RISC-V for what you want to do.
And then you have to verify it only for what you want to do. But if you want to use RISC-V in it as a general purpose processor, and you want to use it for to compete with a cutting edge, Intel, CPU or cutting edge, AMD CPU, then is a different story. It's a way more difficult, and it requires way more resources, way more time, because it's a new architecture. And it is not so highly verified by everybody. In the sense, when you go to complex things, you need an ecosystem around, you need the drivers, you need support from the community, from Microsoft, from you go into from Linux. In general, then it becomes more difficult. Then I think that RISC-V will grow. Now thanks to AI. And it will take still 5 to 10 years to become a real general purpose solution alternative to what you have today is still take time. It will take time to have at least five running a mobile phone.
Q:In terms of the computing efficiency, maybe the data centers has better infrastructure because they have better infrastructures. They have more constant computing power. So why do we need the Edge AI?
A:You don't need Edge AI for efficiency, as you said, is correct. the center, it's way better because you concentrate everything you can eat, especially for utilization, more than the efficiency itself. Utilization is way higher, right? In the center of sun. But you need Edge AI because of privacy, security of data, safety, economics. Think about it, if your car, you cannot have a car that is asking to the cloud, should I turn right or left? Your car needs to have the computing power to react on time without a latency, almost to whatever happening without checking with cloud. Even because in some area, you don't have even coverage.
Second of all, it doesn't make sense even from economics to send everything to cloud, think about surveillance, sophisticated surveillance system, where you have plenty of camera, high resolution camera. It's extremely expensive to think to take all these data and send to cloud, because 95% or 98% of this data is useless. Because you want to understand that you want to identify the things like, I don't know the baggage that someone drop in a railway station or the specific person that is running. And the police is looking for. For these things you don't have to know, why should you send all the data to the cloud? You can extract the right information at the edge, then it's even cheaper to do it. And still, there are plenty of in many area. There is not even coverage. Actually, you don't even have a good connection. Then there is still an infrastructure problem where you can't solve everything, sending data to the cloud, then it makes sense edge computing. It's necessary for many different application, drones, robotics, car, automotive. It makes it up and even surveillance, actually.
Q: What's the problem we need to overcome for the Edge AI solutions? You have already mentioned the power consumption, the platform maybe didn't have so much powers, maybe. And the operating conditions, the light, the latency, the cost, or the maintain abilities?
A:Yeah, I think that the obstacle, for me, it's different is that in the cloud, if you in the cloud, you have few players in China, for example, you have 234 cloud providers, the same in the United States and in Europe. Chinese and Americans are leading the cloud in terms of providers. And there are a few company building up largely the center and providing services.
Therefore, it's easy to design a technology and to provide to them, because it's you have one big customer with one set off a list of features that they need requisite and so on. But when it comes to the edge, you have 1,000 or several thousand of customers, each of them with asking different things. And many of this customer didn't have the background to understand your technology and to twist it in the way they need.
Then the problem of the edges that you need to differ. If you want edge to succeed, you need to have clearly cost effective hardware, because you need to cost effective solution, because the edge customer is more sensitive than a cloud customer. In terms of this, you need to have power efficiency, because you have constraints. You don't in the center of no constraint, you have a power plants close by. But at the edge, you have some constraints. You have to have efficient, but also usability. You need to have something that is plug and play. Customers, they don't have…… 90% of the customer of the edge. They cannot have the engineers so that Baidu can have. It's different, right? It's because they are medium, small companies.
And then you need to give them all the software stack, all the instrument to use very efficiently. But in an easy, simple way, your solution. And there is……Today, for example, at the edge, you have greater envy as a great hardware in terms of performance and platform, but it's too expensive to scale. You can't use $1,000 hardware, a probably in a small robot that you want to sell at $500, right? You can't simply.
And then I think there are solutions which are good, but expensive or there are solution that are cheap, but it is difficult to be used. And it's important to find a good compromise.
Q:So can you tell us more about the maintain ability? And how easy to use for the how important to make the Edge AI chip stay Edge AI solution is easy to use, because we know customers.
A:I can tell you, first of all, customers, they use the cloud to do everything even to train the algorithm. If you are a small medium enterprise and you want to do something in AI, you have to connect to Amazon or bite or whatever. It doesn't matter which kind of player you have to go back to the cloud system and use the the typical tools that you have in the cloud. Where do you get out from it? It’s a network, a training network and applications.
Then the problem is how to use this in the edge. Then if we as excel and we have to give customer a simple, softer stack, which allowed them to take what they did in the cloud and run it at the edge. You have to be sure that customer didn't know what is quantization in the cloud, but in the edge, 90% of 95% of the customer, they don't know what was the difference between floating . 32 and intake. They don't care. We have to solve that problem. They should do whatever they want in the cloud. And then we have to give them the tools to use the same application or the same network of the edge. Then an edge provider needs to build up a softer stack which allow customer to use what they are using today, but deployed at the edge. Company like Ccash, we should be responsible of the deployment, not of the development because customer, they don't want to learn a new things.
If you go to customer and you say, listen, I have a great hardware, but you have to learn my software. They will say, no, I don't have time. I don't want. What should I do it? You have to go to them and say, listen, I have a great hardware and software stack. What you have to do is just take what you have. Push button. I any runs or take what you have, do this abc any runs. It should be very simple. And this is the key aspect that a lot of companies, I think they don't think about it. They think that it's important to be efficient. Yes, the efficiency is important, but it's not only that you need a mix of things, efficiency, throughput cost, and the software stack, even because customer cares about the total cost of ownership, if you go to customer and say, listen, with my cheap, you save, I don't know, 300,000 euro per year, but the customer to change the software need to spend 1 million, then they will not do it simply. Then you have to think at the picture at a big level the implication.
Q:They so is data center more client to use the general purpose, ai computing power. While the edge AI chips may be going the other ways they will design for the pacific use case. And all the customize is something like that.
A:If you go to consumer edge, it’s super customized, because in a television, your television is edge. You have a super solution, and you have a lot of features that are AI-generated, and then it’s super customized. SOC has to do all a few things in a very specific way. It has to be low power consumption, because the television must be at low power consumption. It cannot have a fan or a computer running inside. Then it's highly customized. In the phone it’s the same. In the phone, if super customized is battery power, then probably you don't run floating point network, you run binary networks, and it's good enough, because the customers are not really sensible. When you go to automation, you have to go to find a good compromise, because you have still limitation in power sometimes, but you can compromise and say, I use this net for binary, because probably if using automation and network, you should have high accuracy.
And then you have to find a good compromise between efficiency, throughput, accuracy, then having some limitation of the edge, but still try to look for the precision that you have in cloud computing, then it’s still customized, but it's different, it’s more programmable solution.
Then when you go to cloud, as you said, in a cloud, you have everything. But in a cloud, if you see, there is more and more specialization. The difference is that essentially the crowd in data center, you start to have more and more specialized machine for specialized workload. Because even there is a need of efficiency, not like at the edge, but still it's necessary. At the edge, you try to get 15 Tops per watt, 20, 30, whatever. In the cloud today, the workloads are running at the 0.1 Tops per watt even less. Because if you take a general computing a platform, it's very low deficiency. And then even in the data center, you see the trend to have the trends to have the tensor processing unit, GPU, CPUs, etc. It's a kind of trend and based on the workload, they start to allocate to the different hardware. Then I see trends in the data center to do it.
Q:Does the fragment to the product demand of the ai indicated that it's less likely be model class by large companies, such as the Nvidia, AMD, centralized that. And the small company might have more opportunities in this area.
A:Yes, absolutely. It's traditionally, it's like this. If you think about cloud computing, we always, in the last 20, 30 years, you had always Intel, AMD, Nvidia more recently, then you have and you still have actually 2,3 players that are dominating 98% of the cloud, and a very small portion for new startup or other players.
But if you got at the edge, historically, you have plenty of players, because you have still Intel, AMD, Nvidia, Qualcomm, NXP, Texas Instrument, Renesas, ST Microelectronics, Infineon, and MediaTek, Cirrus Logic, Umbrella Silicone. I can go on, right? You have a lot of players, because as you said, the edge is more specialized, you have plenty of applications. It's very fragmented, and the big players, they don