diff --git a/images/cover_security_privacy.png b/images/cover_security_privacy.png new file mode 100644 index 000000000..7688f1b5c Binary files /dev/null and b/images/cover_security_privacy.png differ diff --git a/images/security_privacy/image1.png b/images/security_privacy/image1.png new file mode 100644 index 000000000..790bca9d5 Binary files /dev/null and b/images/security_privacy/image1.png differ diff --git a/images/security_privacy/image10.png b/images/security_privacy/image10.png new file mode 100644 index 000000000..ee3716069 Binary files /dev/null and b/images/security_privacy/image10.png differ diff --git a/images/security_privacy/image11.png b/images/security_privacy/image11.png new file mode 100644 index 000000000..ae665c5f2 Binary files /dev/null and b/images/security_privacy/image11.png differ diff --git a/images/security_privacy/image12.png b/images/security_privacy/image12.png new file mode 100644 index 000000000..4fc933f31 Binary files /dev/null and b/images/security_privacy/image12.png differ diff --git a/images/security_privacy/image13.png b/images/security_privacy/image13.png new file mode 100644 index 000000000..463a4d6e3 Binary files /dev/null and b/images/security_privacy/image13.png differ diff --git a/images/security_privacy/image14.png b/images/security_privacy/image14.png new file mode 100644 index 000000000..bdfe2b108 Binary files /dev/null and b/images/security_privacy/image14.png differ diff --git a/images/security_privacy/image15.png b/images/security_privacy/image15.png new file mode 100644 index 000000000..00265834e Binary files /dev/null and b/images/security_privacy/image15.png differ diff --git a/images/security_privacy/image16.png b/images/security_privacy/image16.png new file mode 100644 index 000000000..051633896 Binary files /dev/null and b/images/security_privacy/image16.png differ diff --git a/images/security_privacy/image2.png b/images/security_privacy/image2.png new file mode 100644 index 000000000..696e9d2ad Binary files /dev/null and b/images/security_privacy/image2.png differ diff --git a/images/security_privacy/image3.png b/images/security_privacy/image3.png new file mode 100644 index 000000000..6dd6c3903 Binary files /dev/null and b/images/security_privacy/image3.png differ diff --git a/images/security_privacy/image4.png b/images/security_privacy/image4.png new file mode 100644 index 000000000..cc41c4c2c Binary files /dev/null and b/images/security_privacy/image4.png differ diff --git a/images/security_privacy/image5.png b/images/security_privacy/image5.png new file mode 100644 index 000000000..5f2ecd07e Binary files /dev/null and b/images/security_privacy/image5.png differ diff --git a/images/security_privacy/image6.png b/images/security_privacy/image6.png new file mode 100644 index 000000000..74438971a Binary files /dev/null and b/images/security_privacy/image6.png differ diff --git a/images/security_privacy/image7.png b/images/security_privacy/image7.png new file mode 100644 index 000000000..c29612b8e Binary files /dev/null and b/images/security_privacy/image7.png differ diff --git a/images/security_privacy/image8.png b/images/security_privacy/image8.png new file mode 100644 index 000000000..a9740d61a Binary files /dev/null and b/images/security_privacy/image8.png differ diff --git a/images/security_privacy/image9.png b/images/security_privacy/image9.png new file mode 100644 index 000000000..44c3dfe90 Binary files /dev/null and b/images/security_privacy/image9.png differ diff --git a/ops.qmd b/ops.qmd index 8ee71467d..102b29124 100644 --- a/ops.qmd +++ b/ops.qmd @@ -234,7 +234,7 @@ Tight coupling between ML model components makes isolating changes difficult. Mo ### Correction Cascades -![Figure 14.3: The flowchart depicts the concept of correction cascades in the ML workflow, from problem statement to model deployment. The arcs represent the potential iterative corrections needed at each stage of the workflow, with different colors corresponding to distinct issues such as interacting with physical world brittleness, inadequate application-domain expertise, conflicting reward systems, and poor cross-organizational documentation. The red arrows indicate the impact of cascades, which can lead to significant revisions in the model development process, while the dotted red line represents the drastic measure of abandoning the process to restart. This visual emphasizes the complex, interconnected nature of ML system development and the importance of addressing these issues early in the development cycle to mitigate their amplifying effects downstream. [@data_cascades]](images/ai_ops/data_cascades.png) +![Figure 14.3: The flowchart depicts the concept of correction cascades in the ML workflow, from problem statement to model deployment. The arcs represent the potential iterative corrections needed at each stage of the workflow, with different colors corresponding to distinct issues such as interacting with physical world brittleness, inadequate application-domain expertise, conflicting reward systems, and poor cross-organizational documentation. The red arrows indicate the impact of cascades, which can lead to significant revisions in the model development process, while the dotted red line represents the drastic measure of abandoning the process to restart. This visual emphasizes the complex, interconnected nature of ML system development and the importance of addressing these issues early in the development cycle to mitigate their amplifying effects downstream. [@sculley2015hidden](images/ai_ops/data_cascades.png) Building models sequentially creates risky dependencies where later models rely on earlier ones. For example, taking an existing model and fine-tuning it for a new use case seems efficient. However, this bakes in assumptions from the original model that may eventually need correction. diff --git a/privacy_security.qmd b/privacy_security.qmd index f11980c86..7f7d70761 100644 --- a/privacy_security.qmd +++ b/privacy_security.qmd @@ -1,81 +1,1061 @@ -# Privacy and Security +# Security & Privacy + +![_DALL·E 3 Prompt: An illustration on privacy and security in machine learning systems. The image shows a digital landscape with a network of interconnected nodes and data streams, symbolizing machine learning algorithms. In the foreground, there's a large lock superimposed over the network, representing privacy and security. The lock is semi-transparent, allowing the underlying network to be partially visible. The background features binary code and digital encryption symbols, emphasizing the theme of cybersecurity. The color scheme is a mix of blues, greens, and grays, suggesting a high-tech, digital environment._](./images/cover_security_privacy.png) + +Ensuring security and privacy is a critical concern when developing real-world machine learning systems. As machine learning is increasingly applied to sensitive domains like healthcare, finance, and personal data, protecting confidentiality and preventing misuse of data and models becomes imperative. Anyone aiming to build robust and responsible ML systems must have a grasp of potential security and privacy risks such as data leaks, model theft, adversarial attacks, bias, and unintended access to private information. We also need to understand best practices for mitigating these risks. Most importantly, security and privacy cannot be an afterthought and must be proactively addressed throughout the ML system development lifecycle - from data collection and labeling to model training, evaluation, and deployment. Embedding security and privacy considerations into each stage of building, deploying and managing machine learning systems is essential for safely unlocking the benefits of AI. ::: {.callout-tip} + ## Learning Objectives -* coming soon. +* Understand key ML privacy and security risks like data leaks, model theft, adversarial attacks, bias, and unintended data access. + +* Learn from historical hardware and embedded systems security incidents. + +* Identify threats to ML models like data poisoning, model extraction, membership inference, and adversarial examples. + +* Recognize hardware security threats to embedded ML spanning hardware bugs, physical attacks, side channels, counterfeit components, etc. + +* Explore embedded ML defenses like trusted execution environments, secure boot, physical unclonable functions, and hardware security modules. + +* Discuss privacy issues in handling sensitive user data with embedded ML, including regulations. + +* Learn privacy-preserving ML techniques like differential privacy, federated learning, homomorphic encryption, and synthetic data generation. + +* Understand tradeoffs between privacy, accuracy, efficiency, threat models, and trust assumptions. + +* Recognize the need for a cross-layer perspective spanning electrical, firmware, software, and physical design when securing embedded ML devices. ::: ## Introduction -Explanation: In this section, we will set the stage for the readers by introducing the critical role of privacy and security in embedded AI systems. Understanding the foundational concepts is essential to appreciate the various nuances and strategies that will be discussed in the subsequent sections. +Machine learning has evolved substantially from its academic origins, where privacy was not a primary concern. As ML migrated into commercial and consumer applications, the data became more sensitive - encompassing personal information like communications, purchases, and health data. This explosion of data availability fueled rapid advancements in ML capabilities. However, it also exposed new privacy risks, as demonstrated by incidents like the [AOL data leak in 2006](https://en.wikipedia.org/wiki/AOL_search_log_release) and the [Cambridge Analytica](https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html) scandal. + +These events highlighted the growing need to address privacy in ML systems. In this chapter, we explore privacy and security considerations together, as they are inherently linked in ML: + +- Privacy refers to controlling access to sensitive user data, such as financial information or biometric data collected by an ML application. + +- Security protects ML systems and data from hacking, theft, and misuse. + +For example, an ML-powered home security camera must secure video feeds against unauthorized access. It also needs privacy protections to ensure only intended users can view the footage. A breach of either security or privacy could expose private user moments. + +Embedded ML systems like smart assistants and wearables are ubiquitous and process intimate user data. However, their computational constraints often prevent heavy security protocols. Designers must balance performance needs with rigorous security and privacy standards tailored to embedded hardware limitations. + +This chapter provides essential knowledge for addressing the complex privacy and security landscape of embedded ML. We will explore vulnerabilities and cover various techniques that enhance privacy and security within the resource constraints of embedded systems. + +We hope you will gain the principles to develop secure, ethical, embedded ML applications by building a holistic understanding of risks and safeguards. + +## Terminology + +In this chapter, we will be talking about security and privacy together, so there are key terms that we need to be clear about. + +- **Privacy:** For instance, consider an ML-powered home security camera that identifies and records potential threats. This camera records identifiable information, including faces, of individuals who approach, and potentially enter, this home. Privacy concerns may surround who can access this data. + +- **Security:** Consider an ML-powered home security camera that identifies and records potential threats. The security aspect would involve ensuring that these video feeds and recognition models aren't accessible to hackers. + +- **Threat:** Using our home security camera example, a threat could be a hacker trying to gain access to live feeds or stored videos, or using false inputs to trick the system. + +- **Vulnerability:** A common vulnerability might be a poorly secured network through which the camera connects to the internet, which could be exploited to access the data. + +## Historical Precedents + +While the specifics of machine learning hardware security can be distinct, the embedded systems field has a history of security incidents that provide critical lessons for all connected systems, including those using ML. Here are detailed explorations of past breaches: + +### Stuxnet + +In 2010, something unexpected was found on a computer in Iran - a very complicated computer virus that experts had never seen before. [Stuxnet](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/200661/Cyber-Reports-2017-04.pdf) was a malicious computer worm that targeted supervisory control and data acquisition (SCADA) systems and was designed to damage Iran's nuclear program [@farwell2011stuxnet]. Stuxnet was using four "[zero-day exploits](https://en.wikipedia.org/wiki/Zero-day_(computing))" - attacks that take advantage of secret weaknesses in software that no one knows about yet. This made Stuxnet very sneaky and hard to detect. + +But Stuxnet wasn't designed to steal information or spy on people. Its goal was physical destruction - to sabotage centrifuges at Iran's Natanz nuclear plant! So how did the virus get onto computers at the Natanz plant, which was supposed to be disconnected from the outside world for security? Experts think someone inserted a USB stick containing Stuxnet into the internal Natanz network. This allowed the virus to "jump" from an outside system onto the isolated nuclear control systems and wreak havoc. + +Stuxnet was incredibly advanced malware built by national governments to cross from the digital realm into real-world infrastructure. It specifically targeted important industrial machines, where embedded machine learning is highly applicable, in a way never done before. The virus provided a wake-up call about how sophisticated cyberattacks could now physically destroy equipment and facilities. + +This breach was significant due to its sophistication; Stuxnet specifically targeted programmable logic controllers (PLCs) used to automate electromechanical processes such as the speed of centrifuges for uranium enrichment. The worm exploited vulnerabilities in the Windows operating system to gain access to the Siemens Step7 software controlling the PLCs. Despite not being a direct attack on ML systems, Stuxnet is relevant for all embedded systems as it showcases the potential for state-level actors to design attacks that bridge the cyber and physical worlds with devastating effects. + +### Jeep Cherokee Hack + +The Jeep Cherokee hack was a groundbreaking event demonstrating the risks inherent in increasingly connected automobiles [@miller2019lessons]. In a controlled demonstration, security researchers remotely exploited a vulnerability in the Uconnect entertainment system, which had a cellular connection to the internet. They were able to control the vehicle's engine, transmission, and brakes, alarming the automotive industry into recognizing the severe safety implications of cyber vulnerabilities in vehicles. + +{{< video https://www.youtube.com/watch?v=MK0SrxBC1xs&ab_channel=WIRED title="Hackers Remotely Kill a Jeep on a Highway" }} + +While this wasn't an attack on an ML system per se, the reliance of modern vehicles on embedded systems for safety-critical functions has significant parallels to the deployment of ML in embedded systems, underscoring the need for robust security at the hardware level. + +### Mirai Botnet + +The Mirai botnet involved the infection of networked devices such as digital cameras and DVR players [@antonakakis2017understanding]. In October 2016, the botnet was used to conduct one of the largest [DDoS](https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/) attacks ever, disrupting internet access across the United States. The attack was possible because many devices used default usernames and passwords, which were easily exploited by the Mirai malware to control the devices. + +{{< video https://www.youtube.com/watch?v=1pywzRTJDaY >}} + +Although the devices were not ML-based, the incident is a stark reminder of what can happen when numerous embedded devices with poor security controls are networked together, a situation that is becoming more common with the growth of ML-based IoT devices. + +### Implications + +These historical breaches demonstrate the cascading effects of hardware vulnerabilities in embedded systems. Each incident offers a precedent for understanding the risks and designing better security protocols. For instance, the Mirai botnet highlights the immense destructive potential when threat actors can gain control over networked devices with weak security, a situation becoming increasingly common with ML systems. Many current ML devices function as "edge" devices meant to collect and process data locally before sending it to the cloud. Much like the cameras and DVRs compromised by Mirai, edge ML devices often rely on embedded hardware like ARM processors and run lightweight OS like Linux. Securing the device credentials is critical. + +Similarly, the Jeep Cherokee hack was a watershed moment for the automotive industry. It exposed serious vulnerabilities in the growing network-connected vehicle systems and their lack of isolation from core drive systems like brakes and steering. In response, auto manufacturers invested heavily in new cybersecurity measures, though gaps likely remain. + +Chrysler did a recall to patch the vulnerable Uconnect software, allowing the remote exploit. This included adding network-level protections to prevent unauthorized external access and compartmentalizing in-vehicle systems to limit lateral movement. Additional layers of encryption were added for commands sent over the CAN bus within vehicles. + +The incident also spurred the creation of new cybersecurity standards and best practices. The [Auto-ISAC](https://automotiveisac.com/) was established for automakers to share intelligence, and the NHTSA guided managing risks. New testing and audit procedures were developed to assess vulnerabilities proactively. The aftereffects continue to drive change in the automotive industry as cars become increasingly software-defined. + +Unfortunately, in the rush to develop new ML edge devices, manufacturers often overlook security - using default passwords, unencrypted communications, unsecured firmware updates, etc. Any such vulnerabilities could allow attackers to gain access and control devices at scale by infecting them with malware. With a botnet of compromised ML devices, attackers could leverage their aggregated computational power for DDoS attacks on critical infrastructure. + +While these events didn't involve machine learning hardware directly, the principles of the attacks carry over to ML systems, which often involve similar embedded devices and network architectures. As ML hardware often operates in continuous interaction with the physical world, securing it against such breaches is paramount. The evolution of security measures in response to these incidents provides valuable insights into protecting current and future ML systems from analogous vulnerabilities. + +The distributed nature of ML edge devices means threats can propagate quickly across networks. And if devices are being used for mission-critical purposes like medical devices, industrial controls or self-driving vehicles, the potential physical damage from weaponized ML bots could be severe. Just like Mirai demonstrated the dangerous potential of poorly secured IoT devices, the litmus test for ML hardware security will be how vulnerable or resilient these devices are to worm-like attacks. The stakes are raised as ML spreads to safety-critical domains, putting the onus on manufacturers and system operators to incorporate the lessons from Mirai. + +The lesson is the importance of designing for security from the outset and having layered defenses. For ML systems, the Jeep case highlights potential blindspots around externally facing software interfaces as well as isolation between subsystems. Manufacturers of ML devices and platforms should assume a similar proactive and comprehensive approach to security rather than leaving it as an afterthought. Rapid response and dissemination of best practices will be key as threats continue evolving. + +## Security Threats to ML Models + +ML models face security risks that can undermine their integrity, performance, and trustworthiness if not properly addressed. While there are several different threats, the key threats include: 1) model theft, where adversaries steal the proprietary model parameters and the sensitive data they contain; 2) data poisoning, which compromises models through data tampering; and 3) adversarial attacks, which deceive the model to make incorrect or unwanted predictions. + +### Model Theft + +Model theft occurs when an attacker gains unauthorized access to a deployed ML model. The concern here is the theft of the model's structure and trained parameters and the proprietary data it contains [@ateniese2015hacking]. Model theft is a real and growing threat, as demonstrated by cases like ex-Google engineer Anthony Levandowski, who [allegedly stole Waymo's self-driving car designs](https://www.nytimes.com/2017/02/23/technology/google-self-driving-waymo-uber-otto-lawsuit.html) and started a competing company. Beyond economic impacts, model theft can seriously undermine privacy and enable further attacks. + +For instance, consider an ML model developed for personalized recommendations in an e-commerce application. If a competitor steals this model, they gain insights into business analytics, customer preferences, and even trade secrets embedded within the model's data. Attackers could leverage stolen models to craft more effective inputs for model inversion attacks, deducing private details about the model's training data. A cloned e-commerce recommendation model could reveal customer purchase behaviors and demographics. + +To understand model inversion attacks, consider a facial recognition system used to grant access to secured facilities. The system is trained on a dataset of employee photos. An attacker, by observing the model's output to various inputs, could infer features of the original dataset. For example, if the model's confidence level for a particular face is significantly higher for a given set of features, an attacker might deduce that someone with those features is likely in the training dataset. + +The methodology of model inversion typically involves the following steps: + +- **Accessing Model Outputs:** The attacker queries the ML model with input data and observes the outputs. This is often done through a legitimate interface, like a public API. + +- **Analyzing Confidence Scores:** For each input, the model provides a confidence score that reflects how similar the input is to the training data. + +- **Reverse-Engineering:** By analyzing the confidence scores or output probabilities, attackers can use optimization techniques to reconstruct what they believe is close to the original input data. + +One historical example of such a vulnerability being explored was the research on inversion attacks against the U.S. Netflix Prize dataset, where researchers demonstrated that it was possible to learn about an individual's movie preferences, which could lead to privacy breaches [@narayanan2006break]. + +Model theft implies that it could lead to economic losses, undermine competitive advantage, and violate user privacy. There's also the risk of model inversion attacks, where an adversary could input various data into the stolen model to infer sensitive information about the training data. + +Model theft attacks can be divided into two categories based on the desired asset: exact model properties and approximate model behavior. + +##### Stealing Exact Model Properties + +In these attacks, the objective is to extract information about concrete metrics, such as the learned parameters of a network, the fine-tuned hyperparameters, and the model's internal layer architecture [@oliynyk2023know]. + +- **Learned Parameters:** adversaries aim to steal the learned knowledge (weights and biases) of a model in order to replicate it. Parameter theft is generally used in conjunction with other attacks, such as architecture theft, which lacks parameter knowledge. + +- **Fine-Tuned Hyperparameters:** training is costly, and finding the right configuration of hyperparameters (such as the learning rate and regularization) can be a very long and expensive process.Thus, stealing an optimized model's hyperparameters can allow an adversary to replicate the model without the high training costs. + +- **Model Architecture:** this attack is concerned with the specific design and structure of the model, such as layers, neurons, and connectivity patterns. Aside from the reduction in associated training costs it can provide an attacker, this type of theft is especially dangerous because it concerns core IP theft, which can affect a company's competitive edge. Architecture theft can be achieved by exploiting side-channel attacks (discussed later). + +##### Stealing Approximate Model Behavior + +Instead of focusing on extracting exact numerical values of the model's parameters, these attacks aim at reproducing the model's behavior (predictions and effectiveness), decision-making, and high-level characteristics [@oliynyk2023know]. These techniques aim at achieving similar outcomes while allowing for internal deviations in parameters and architecture. Types of approximate behavior theft include achieving the same level of effectiveness and obtaining prediction consistency. + +- **Level of Effectiveness:** Rather than focus on the precise parameter values, attackers aim to replicate the model's decision-making capabilities. This is done through understanding the overall behavior of the model. Consider a scenario where an attacker wants to copy the behavior of an image classification model. Through analysis of the model's decision boundaries, the attack tunes their model to reach a level of effectiveness comparable to the original model. This could entail analyzing 1) the confusion matrix to understand the balance of prediction metrics (true positive, true negative, false positive, false negative), and 2)other performance metrics, such as F1 score and precision, to ensure that the two models are comparable. + +- **Prediction Consistency:** The attacker tries to align their model's prediction patterns with those of the target model. This involves matching prediction outputs (both positive and negative) on the same set of inputs and ensuring distributional consistency across different classes. For instance, consider a natural language processing (NLP) model that generates sentiment analysis for move reviews (labels reviews as positive, neutral, or negative). The attacker will try to fine-tune their model to match the prediction of the original models on the same set of movie reviews. This includes ensuring that the model makes the same mistakes (mispredictions) that the targeted model makes. + +#### Case Study + +In 2018, Tesla filed a [lawsuit](https://storage.courtlistener.com/recap/gov.uscourts.nvd.131251/gov.uscourts.nvd.131251.1.0_1.pdf) against self-driving car startup [Zoox](https://zoox.com/), alleging former employees stole confidential data and trade secrets related to Tesla's autonomous driving assistance system. + +Tesla claimed that several of its former employees took over 10GB of proprietary data including ML models and source code before joining Zoox. This allegedly included one of Tesla's crucial image recognition models used for identifying objects. + +The theft of this sensitive proprietary model could potentially help Zoox shortcut years of ML development and duplicate Tesla's capabilities. Tesla argued this theft of IP caused major financial and competitive harm. There were also concerns it could allow model inversion attacks to infer private details about Tesla's testing data. + +The Zoox employees denied stealing any proprietary information. However, the case highlights the significant risks of model theft - enabling cloning of commercial models, causing economic impacts, and opening the door for further data privacy violations. + +### Data Poisoning + +Data poisoning is an attack where the training data is tampered with, leading to a compromised model [@biggio2012poisoning]. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model's learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests and validation data can allow adversaries to boost reported model performance artificially. + +The process usually involves the following steps: + +- **Injection:** The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process. + +- **Training:** The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns. + +- **Deployment:** Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit. + +The impacts of data poisoning extend beyond just classification errors or accuracy drops. For instance, if incorrect or malicious data is introduced into a traffic sign recognition system's training set, the model may learn to misclassify stop signs as yield signs, which can have dangerous real-world consequences, especially in embedded autonomous systems like autonomous vehicles. + +Data poisoning can degrade the accuracy of a model, force it to make incorrect predictions or cause it to behave unpredictably. In critical applications like healthcare, such alterations can lead to significant trust and safety issues. + +There are six main categories of data poisoning [@oprea2022poisoning]: + +* **Availability Attacks**: these attacks aim to compromise the overall functionality of a model. They cause it to misclassify the majority of testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one. + +* **Targeted Attacks:** in contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples.So the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy on the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model's classes.It also makes detecting these attacks more challenging. + +* **Backdoor Attacks:** in these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor(a malicious, hidden trigger or pattern) into the training data.For example, manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels.As a result, when the model encounters test samples that contain the malicious pattern, it makes false predictions. + +* **Subpopulation Attacks:** here attackers selectively choose to compromise a subset of the testing samples, while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks:performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences: + +- **Scope:** while targeted attacks target a selected set of samples,subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a 'speed bump' warning sign(with carefully crafted perturbation or patterns), which causes an autonomous car to fail to recognize such sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify aBritish person's speech is an example of a subpopulation attack. + +- **Knowledge:** while targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge in order to be effective. + +#### Case Study 1 + +In 2017, researchers demonstrated a data poisoning attack against a popular toxicity classification model called Perspective [@hosseini2017deceiving]. This ML model is used to detect toxic comments online. + +The researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model's training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time. + +After retraining on the poisoned data, the model's false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems. + +This case highlights how data poisoning can degrade model accuracy and reliability over time. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains. + +#### Case Study 2 + +Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online. + +While these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative AI models when incorporated into the training data. Generative models can be manipulated into generating hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers were able to trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars. + +As the number of poisoned images on the internet increases, the performance of the models that use scraped data will deteriorate exponentially. First, the poisoned data is hard to detect, and would require a manual elimination process. Second, the "poison" spreads quickly to other labels because generative models rely on connections between words and concepts as they generate images. So a poisoned image of a "car" could spread into generated images associated with words like "truck", "train", "bus", etc. + +On the flip side, this tool can be used maliciously and can affect legitimate applications of the generative models. This goes to show the very challenging and novel nature of machine learning attacks. + +Figure 1 demonstrates an example of images from a poisoned model at different levels of poisoning: + +![Figure 1: The effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. The progression of producing cat images from a dog prompt, cow images from a car prompt, and other poisoned results is demonstrated above.](images/security_privacy/image14.png) + +### Adversarial Attacks + +Adversarial attacks are methods that aim to trick models into making incorrect predictions by providing it with specially crafted, deceptive inputs (called adversarial examples) [@parrish2023adversarial]. By adding slight perturbations to input data, adversaries can "hack" a model's pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction. + +In text-to-image models like DALLE [@ramesh2021zero] or Stable Diffusion [@Rombach22cvpr], one can generate prompts that lead to unsafe images. For example, by altering the pixel values of an image, attackers can deceive a facial recognition system into identifying a face as a different person. + +Adversarial attacks exploit the way ML models learn and make decisions during inference. These models work on the principle of recognizing patterns in data. An adversary crafts special inputs with perturbations to mislead the model's pattern recognition---essentially 'hacking' the model's perceptions. + +Adversarial attacks fall under different scenarios: + +- **Whitebox Attacks:** the attacker possess full knowledge of the target model's internal workings, including the training data,parameters, and architecture. This comprehensive access creates favorable conditions for an attacker to exploit the model's vulnerabilities. The attacker can take advantage of specific and subtle weaknesses to craft effective adversarial examples. + +- **Blackbox Attacks:** in contrast to whitebox attacks, in blackbox attacks, the attacker has little to no knowledge of the target model. To carry out the attack, the adversarial actor needs to make careful observations of the model's output behavior. + +- **Greybox Attacks:** these fall in between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model's internal design. For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real-world, practical attacks fall under both blackbox and greybox scenarios. + +The landscape of machine learning models is both complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, a diverse array of adversarial attack methods has emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models: + +* **Generative Adversarial Networks (GANs)** are deep learning models that consist of two networks competing against each other: a generator and and a discriminator [@goodfellow2020generative]. The generator tries to synthesize realistic data, while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that are misclassified by the target model. These GAN-generated images can then be used to attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples, and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape. + +* **Transfer Learning Adversarial Attacks** exploit the knowledge transferred from a pre-trained model to a target model, enabling the creation of adversarial examples that can deceive both models.These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer that is responsible for making the final classifications). Referred to as"headless attacks," these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used [@Abdelkader_2020]. + +#### Case Study + +In 2017, researchers conducted experiments by placing small black and white stickers on stop signs [@eykholt2018robust]. When viewed by a normal human eye, the stickers did not obscure the sign or prevent interpretability. However, when images of the stickers stop signs were fed into standard traffic sign classification ML models, they were misclassified as speed limit signs over 85% of the time. + +This demonstration showed how simple adversarial stickers could trick ML systems into misreading critical road signs. These attacks could endanger public safety if deployed in the real world, causing autonomous vehicles to misinterpret stop signs as speed limits. Researchers warned this could potentially cause dangerous rolling stops or acceleration into intersections. + +This case study provides a concrete illustration of how adversarial examples exploit how ML models recognize patterns. By subtly manipulating the input data, attackers can induce incorrect predictions and create serious risks for safety-critical applications like self-driving cars. The attack's simplicity shows how even minor changes imperceptible to humans can lead models astray. Developers need robust defenses against such threats. + +## Security Threats to ML Hardware + +Discussing the threats to embedded ML hardware security in a structured order is useful for a clear and in-depth understanding of the potential pitfalls for ML systems. We will begin with hardware bugs. We address the issues where intrinsic design flaws in the hardware can be a gateway to exploitation. This forms the fundamental knowledge required to understand the genesis of hardware vulnerabilities. Moving to physical attacks establishes the basic threat model from there, as these are the most overt and direct methods of compromising hardware integrity. Fault-injection attacks naturally extend this discussion, showing how specific manipulations can induce systematic failures. + +Advancing to side-channel attacks next will show the increasing complexity, as these rely on exploiting indirect information leakages, requiring a nuanced understanding of hardware operations and environmental interactions. Leaky interfaces will show how external communication channels can become vulnerable, leading to inadvertent data exposures. Counterfeit hardware discussions benefit from prior explorations of hardware integrity and exploitation techniques, as they often compound these issues with additional risks due to their questionable provenance. Finally, supply chain risks encompass all concerns above and frame them within the context of the hardware's journey from production to deployment, highlighting the multifaceted nature of hardware security and the need for vigilance at every stage. + +Here's an overview table summarizing the topics: + +| Threat Type | Description | Relevance to Embedded ML Hardware Security | +| ----------------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------ | +| Hardware Bugs | Intrinsic flaws in hardware designs that can compromise system integrity. | Foundation of hardware vulnerability. | +| Physical Attacks | Direct exploitation of hardware through physical access or manipulation. | Basic and overt threat model. | +| Fault-injection Attacks | Induction of faults to cause errors in hardware operation, leading to potential system compromise. | Systematic manipulation leading to failure. | +| Side-Channel Attacks | Exploitation of leaked information from hardware operation to extract sensitive data. | Indirect attack via environmental observation. | +| Leaky Interfaces | Vulnerabilities arising from interfaces that expose data unintentionally. | Data exposure through communication channels. | +| Counterfeit Hardware | Use of unauthorized hardware components that may have security flaws. | Compounded vulnerability issues. | +| Supply Chain Risks | Risks introduced through the lifecycle of hardware, from production to deployment. | Cumulative and multifaceted security challenges. | + +### Hardware Bugs + +Hardware is not immune to the pervasive issue of design flaws or bugs. Attackers can exploit these vulnerabilities to access, manipulate, or extract sensitive data, breaching the confidentiality and integrity that users and services depend on. An example of such vulnerabilities came to light with the discovery of Meltdown and Spectre---two hardware vulnerabilities that exploit critical vulnerabilities in modern processors. These bugs allow attackers to bypass the hardware barrier that separates applications, allowing a malicious program to read the memory of other programs and the operating system. + +Meltdown [@Lipp2018meltdown] and Spectre [@Kocher2018spectre] work by taking advantage of optimizations in modern CPUs that allow them to speculatively execute instructions out of order before validity checks have completed. This reveals data that should be inaccessible, which the attack captures through side channels like caches. The technical complexity demonstrates the difficulty of eliminating vulnerabilities even with extensive validation. + +If an ML system is processing sensitive data, such as personal user information or proprietary business analytics, Meltdown and Spectre represent a real and present danger to data security. Consider the case of an ML accelerator card, which is designed to speed up machine learning processes, such as the ones we discussed in the [AI Hardware](./hw_acceleration.qmd) chapter. These accelerators work in tandem with the CPU to handle complex calculations, often related to data analytics, image recognition, and natural language processing. If such an accelerator card has a vulnerability akin to Meltdown or Spectre, it could potentially leak the data it processes. An attacker could exploit this flaw not just to siphon off data but also to gain insights into the ML model's workings, including potentially reverse-engineering the model itself (thus, going back to the issue of [model theft](@sec-model_theft). + +A real-world scenario where this could be devastating would be in the healthcare industry. Here, ML systems routinely process highly sensitive patient data to help diagnose, plan treatment, and forecast outcomes. A bug in the system's hardware could lead to the unauthorized disclosure of personal health information, violating patient privacy and contravening strict regulatory standards like the [Health Insurance Portability and Accountability Act (HIPAA)](https://www.cdc.gov/phlp/publications/topic/hipaa.html) + +The [Meltdown and Spectre](https://meltdownattack.com/) vulnerabilities are stark reminders that hardware security is not just about preventing unauthorized physical access, but also about ensuring that the hardware's architecture does not become a conduit for data exposure. Similar hardware design flaws regularly emerge in CPUs, accelerators, memory, buses, and other components. This necessitates ongoing retroactive mitigations and performance tradeoffs in deployed systems. Proactive solutions like confidential computing architectures could mitigate entire classes of vulnerabilities through fundamentally more secure hardware design. Thwarting hardware bugs requires rigor at every design stage, validation, and deployment. + +### Physical Attacks + +Physical tampering refers to the direct, unauthorized manipulation of physical computing resources to undermine the integrity of machine learning systems. It's a particularly insidious attack because it circumvents traditional cybersecurity measures, which often focus more on software vulnerabilities than hardware threats. + +Physical tampering can take many forms, from the relatively simple, such as someone inserting a USB device loaded with malicious software into a server, to the highly sophisticated, such as embedding a hardware Trojan during the manufacturing process of a microchip (discussed later in greater detail in the Supply Chain section). ML systems are susceptible to this attack because they rely on the accuracy and integrity of their hardware to process and analyze vast amounts of data correctly. + +Consider an ML-powered drone used for geographical mapping. The drone's operation relies on a series of onboard systems, including a navigation module that processes inputs from various sensors to determine its path. If an attacker gains physical access to this drone, they could replace the genuine navigation module with a compromised one that includes a backdoor. This manipulated module could then alter the drone's flight path to conduct surveillance over restricted areas or even smuggle contraband by flying undetected routes. + +Another example is the physical tampering of biometric scanners used for access control in secure facilities. By introducing a modified sensor that transmits biometric data to an unauthorized receiver, an attacker can access personal identification data to authenticate individuals. + +There are several ways that physical tampering can occur in ML hardware: + +- **Manipulating sensors:** Consider an autonomous vehicle that relies on cameras and LiDAR for situational awareness. An attacker could carefully calibrate the physical alignment of these sensors to introduce blindspots or distort critical distances. This could impair object detection and endanger passengers. + +- **Hardware trojans:** Malicious circuit modifications can introduce trojans that activate under certain inputs. For example, an ML accelerator chip could function normally until a rare trigger case occurs, causing it to accelerate unsafely. + +- **Tampering with memory:** Physically exposing and manipulating memory chips could allow extraction of encrypted ML model parameters.Fault injection techniques can also corrupt model data to degrade accuracy. + +- **Introducing backdoors:** Gaining physical access to servers, an adversary could use hardware keyloggers to capture passwords and create backdoor accounts for persistent access. These could then be used to exfiltrate ML training data over time. + +- **Supply chain attacks:** Manipulating third-party hardware components or compromising manufacturing and shipping channels creates systemic vulnerabilities that are difficult to detect and remediate. + +### Fault-injection Attacks + +By intentionally introducing faults into ML hardware, attackers can induce errors in the computational process, leading to incorrect outputs. This manipulation compromises the integrity of ML operations and can serve as a vector for further exploitation, such as system reverse engineering or security protocol bypass. Fault injection involves intentionally disrupting normal computations in a system through external interference [@joye2012fault]. By precisely triggering computational errors, adversaries can alter program execution in ways that degrade reliability or leak sensitive information. + +Various physical tampering techniques can be used for fault injection. Low voltage [@barenghi2010low], power spikes [@hutter2009contact], clock glitches [@amiel2006fault], electromagnetic pulses [@agrawal2003side], temperate increase [@skorobogatov2009local] and laser strikes [@skorobogatov2003optical] are common hardware attack vectors. They are precisely timed to induce faults like flipped bits or skipped instructions during key operations. + +For ML systems, consequences include impaired model accuracy, denial of service, extraction of private training data or model parameters, and reverse engineering of model architectures. Attackers could use fault injection to force misclassifications, disrupt autonomous systems, or steal intellectual property. + +For example, in [@breier2018deeplaser], the authors were able to successfully inject a fault attack into a deep neural network deployed on a microcontroller. They used a laser to heat up specific transistors, forcing them to switch states. In one instance, they used this method to attack a ReLU activation function resulting in the function to always outputing a value of 0, regardless of the input. In the assembly code in Figure 2, the attack caused the executing program to always skip the `jmp` end instruction on line 6. This means that `HiddenLayerOutput[i]` is always set to 0, overwriting any values written to it on lines 4 and 5. As a result, the targeted neurons are rendered inactive, resulting in misclassifications. + +![Figure 2: Assembly code demonstrating the impact of the fault-injection attack on a deep neural network deployed on a microcontroller. The fault-injection attack caused the rogram to always skip the jmp end instruction on line 6, causing `HiddenLayerOutput[i]` to be reset to zero. This overwrites any values written to it on lines 4 and 5, rendering certain neurons always inactive and resulting in misclassifications.](images/security_privacy/image3.png) + +The strategy for an attacker could be to infer information about the activation functions using side-channel attacks (discussed next). Then the attacker could attempt to target multiple activation function computations by randomly injecting faults into the layers that are as close to the output layer as possible. This increases the likelihood and impact of the attack. + +Embedded devices are particularly vulnerable due to limited physical hardening and resource constraints that restrict robust runtime defenses. Without tamper-resistant packaging, attacker access to system buses and memory enables precise fault strikes. Lightweight embedded ML models also lack redundancy to overcome errors. + +These attacks can be particularly insidious because they bypass traditional software-based security measures, often not accounting for physical disruptions. Furthermore, because ML systems rely heavily on the accuracy and reliability of their hardware for tasks like pattern recognition, decision-making, and automated responses, any compromise in their operation due to fault injection can have serious and wide-ranging consequences. + +Mitigating fault injection risks necessitates a multilayer approach. Physical hardening through tamper-proof enclosures and design obfuscation helps reduce access. Lightweight anomaly detection can identify unusual sensor inputs or erroneous model outputs [@hsiao2023mavfi]. Error-correcting memories minimize disruption, while data encryption safeguards information. Emerging model watermarking techniques trace stolen parameters. + +However, balancing robust protections with embedded systems' tight size and power limits remains challenging. Cryptography limits and lack of secure co-processors on cost-sensitive embedded hardware restrict options. Ultimately, fault injection resilience demands a cross-layer perspective spanning electrical, firmware, software, and physical design layers. + +### Side-Channel Attacks + +Side-channel attacks are a category of security breach that depends on information gained from the physical implementation of a computer system. Unlike direct attacks on software or network vulnerabilities, side-channel attacks exploit the hardware characteristics of a system. These attacks can be particularly effective against complex machine learning systems, where large amounts of data are processed and a high level of security is expected. + +The fundamental premise of a side-channel attack is that a device's operation can inadvertently leak information. Such leaks can come from various sources, including the electrical power a device consumes [@kocher1999differential], the electromagnetic fields it emits [@gandolfi2001electromagnetic], the time it takes to process certain operations or even the sounds it produces. Each channel can indirectly glimpse the system's internal processes, revealing information that can compromise security. + +For instance, consider a machine learning system performing encrypted transactions. Encryption algorithms are supposed to secure data but also require computational work to encrypt and decrypt information. An attacker can analyze the power consumption patterns of the device performing encryption to figure out the cryptographic key. With sophisticated statistical methods, small variations in power usage during the encryption process can be correlated with the data being processed, eventually revealing the key. Some differential analysis attack techniques are Differential Power Analysis (DPA) [@Kocher2011Intro], Differential Electromagnetic Analysis (DEMA), and Correlation Power Analysis (CPA). + +For example, consider an attacker who is trying to break the AES encryption algorithm using a differential analysis attack. The attacker would first need to collect a large number of power or electromagnetic traces (a trace is a record of consumptions or emissions) of the device while it is performing AES encryption. + +Once the attacker has collected a sufficient number of traces, they would then use a statistical technique to identify correlations between the traces and the different values of the plaintext (original, unencrypted text) and ciphertext (encrypted text). These correlations would then be used to infer the value of a bit in the AES key, and eventually the entire key. Differential analysis attacks are dangerous because they are low cost, effective, and non-intrusive, which allows attackers to bypass both algorithmic and hardware-level security measures. Compromises by these attacks are also hard to detect because they do not physically modify the device or break the encryption algorithm. + +Below is a simplified visualization of how analyzing the power consumption patterns of the encryption device can help us extract information about algorithm's operations and, in turn, about the secret data. Say we have a device that takes a 5-byte password as input. We are going to analyze and compare the different voltage patterns that are measured while the encryption device is performing operations on the input to authenticate the password. + +First, consider the power analysis of the device's operations after entering a correct password in the first picture in Figure 3. The dense blue graph is the output of the encryption device's voltage measurement. What matters here is the comparison between the different analysis charts rather than the specific details of what is going on in each scenario. + +![Figure 3: The power analysis chart of the encryption device's operations after the correct password is entered. The dense blue graph represents the output of the encryption device’s voltage measurement.](images/security_privacy/image5.png) + +Now, let's look at the power analysis chart when we enter an incorrect password in Figure 4. The first three bytes of the password are correct. As a result, we can see that the voltage patterns are very similar or identical between the two charts, up to and including the fourth byte. After the device processes the fourth byte, it determines that there is a mismatch between the secret key and the attempted input. We notice a change in the pattern at the transition point between the fourth and fifth bytes: the voltage has gone up (the current has gone down) because the device has stopped processing the rest of the input. + +![Figure 4: The power analysis chart of the encryption device's operations after an incorrect password is entered. The first three bytes are correct, but the fourth byte is incorrect, causing the device to spot a mismatch between the secret key and the attempted input. As a result, the device stops processing the rest of the input, and the voltage, as respresented by the blue graph, increases.](images/security_privacy/image16.png) + +Figure 5 describes another chart of a completely wrong password. After the device finishes processing the first byte, it determines that it is incorrect and stops further processing - the voltage goes up and the current down. + +![Figure 5: Another example power analysis chart of the encryption device's operations after an incorrect password is entered. In this case, the first byte is incorrect, so the device spots this mismatch and does not process the rest of the inputs.](images/security_privacy/image15.png) + +The example above shows how we can infer information about the encryption process and the secret key itself through analyzing different inputs and try to 'eavesdrop' on the operations that the device is performing on each byte of the input. + +For additional details, please see the following video: + +{{< video https://www.youtube.com/watch?v=2iDLfuEBcs8&ab_channel=ColinO'Flynn title="ECED4406 - 0x501 Power Analysis Attacks" }} + +Another example is an ML system for speech recognition, which processes voice commands to perform actions. By measuring the time it takes for the system to respond to commands or the power used during processing, an attacker could infer what commands are being processed and thus learn about the system's operational patterns. Even more subtle, the sound emitted by a computer's fan or hard drive could change in response to the workload, which a sensitive microphone could pick up and analyze to determine what kind of operations are being performed. + +In real-world scenarios, side-channel attacks have been used to extract encryption keys and compromise secure communications. One of the earliest recorded side-channel attacks dates back to the 1960s when British intelligence agency MI5 faced the challenge of deciphering encrypted communications from the Egyptian Embassy in London. Their cipher-breaking attempts were thwarted by the computational limitations of the time until an ingenious observation changed the game. + +MI5 agent Peter Wright proposed using a microphone to capture the subtle acoustic signatures emitted from the embassy's rotor cipher machine during encryption [@Burnet1989Spycatcher]. The distinct mechanical clicks of the rotors as operators configured them daily leaked critical information about the initial settings. This simple side channel of sound enabled MI5 to reduce the complexity of deciphering messages dramatically. This early acoustic leak attack highlights that side-channel attacks are not merely a digital age novelty but a continuation of age-old cryptanalytic principles. The notion that where there is a signal, there is an opportunity for interception remains foundational. From mechanical clicks to electrical fluctuations and beyond, side channels enable adversaries to extract secrets indirectly through careful signal analysis. + +Today, acoustic cryptanalysis has evolved into attacks like keyboard eavesdropping [@Asonov2004Keyboard]. Electrical side channels range from power analysis on cryptographic hardware [@gnad2017voltage] to voltage fluctuations [@zhao2018fpga] on machine learning accelerators. Timing, electromagnetic emission, and even heat footprints can likewise be exploited. New and unexpected side channels often emerge as computing becomes more interconnected and miniaturized. + +Just as MI5's analogue acoustic leak transformed their codebreaking, modern side-channel attacks circumvent traditional boundaries of cyber defense. Understanding the creative spirit and historical persistence of side channel exploits is key knowledge for developers and defenders seeking to secure modern machine learning systems comprehensively against digital and physical threats. + +### Leaky Interfaces + +Leaky interfaces in embedded systems are often overlooked backdoors that can become significant security vulnerabilities. While designed for legitimate purposes such as communication, maintenance, or debugging, these interfaces may inadvertently provide attackers with a window through which they can extract sensitive information or inject malicious data. + +An interface becomes "leaky" when it exposes more information than it should, often due to a lack of stringent access controls or inadequate shielding of the transmitted data. Here are some real-world examples of leaky interface issues causing security problems in IoT and embedded devices: + +- **Baby Monitors:** Many WiFi-enabled baby monitors have been found to have unsecured interfaces for remote access. This allowed attackers to gain live audio and video feeds from people's homes, representing a major [privacy violation](https://www.fox19.com/story/25310628/hacked-baby-monitor/). + +- **Pacemakers:** Interface vulnerabilities were discovered in some [pacemakers](https://www.fda.gov/medical-devices/medical-device-recalls/abbott-formally-known-st-jude-medical-recalls-assuritytm-and-enduritytm-pacemakers-potential) that could allow attackers to manipulate cardiac functions if exploited. This presents a potential life-threatening scenario. + +- **Smart Lightbulbs:** A researcher found he could access unencrypted data from smart lightbulbs via a debug interface, including WiFi credentials, allowing him to gain access to the connected network [@dhanjani2015abusing]. + +- **Smart Cars:** The OBD-II diagnostic port has been shown to provide an attack vector into automotive systems if left unsecured.Researchers were able to take control of brakes and other components through it [@miller2015remote]. + +While the above are not directly connected with ML, consider the example of a smart home system with an embedded ML component that controls home security based on behavior patterns it learns over time. The system includes a maintenance interface accessible via the local network for software updates and system checks. If this interface does not require strong authentication or if the data transmitted through it is not encrypted, an attacker on the same network could potentially gain access to it. They could then eavesdrop on the homeowner's daily routines or reprogram the security settings by manipulating the firmware. + +Such leaks are a privacy issue and a potential entry point for more damaging exploits. The exposure of training data, model parameters, or ML outputs from a leak could help adversaries construct adversarial examples or reverse-engineer models. Access through a leaky interface could also be used to alter an embedded device's firmware, loading it with malicious code that could disable the device, intercept data, or use the device in botnet attacks. + +To mitigate these risks, a multilayered approach is necessary spanning technical controls like authentication, encryption, anomaly detection, policies and processes like interface inventories, access controls, auditing, and secure development practices. Disabling unnecessary interfaces and compartmentalizing risks via a zero-trust model provide additional protection. + +As designers of embedded ML systems, we should assess interfaces early in development and continually monitor them post-deployment as part of an end-to-end security lifecycle. Understanding and securing interfaces is crucial for ensuring the overall security of embedded ML. + +### Counterfeit Hardware + +ML systems are only as reliable as the underlying hardware. In an era where hardware components are global commodities, the rise of counterfeit or cloned hardware presents a significant challenge. Counterfeit hardware encompasses any components that are unauthorized reproductions of original parts. Counterfeit components infiltrate ML systems through complex supply chains that stretch across borders and involve numerous stages from manufacture to delivery. + +A single lapse in the supply chain's integrity can result in the insertion of counterfeit parts designed to imitate the functions and appearance of genuine hardware closely. For instance, a facial recognition system for high-security access control may be compromised if equipped with counterfeit processors. These processors could fail to accurately process and verify biometric data, potentially allowing unauthorized individuals to access restricted areas. + +The challenge with counterfeit hardware is multifaceted. It undermines the quality and reliability of ML systems, as these components may degrade faster or perform unpredictably due to substandard manufacturing. The security risks are also profound; counterfeit hardware can contain vulnerabilities ripe for exploitation by malicious actors. For example, a cloned network router in an ML data center might include a hidden backdoor, enabling data interception or network intrusion without detection. + +Furthermore, counterfeit hardware poses legal and compliance risks. Companies inadvertently utilizing counterfeit parts in their ML systems may face serious legal repercussions, including fines and sanctions for failing to comply with industry regulations and standards. This is particularly true for sectors where compliance with specific safety and privacy regulations is mandatory, such as healthcare and finance. + +The issue of counterfeit hardware is exacerbated by the economic pressures of reducing costs, which can compel businesses to source from lower-cost suppliers without stringent verification processes. This economizing can inadvertently introduce counterfeit parts into otherwise secure systems. Additionally, detecting these counterfeits is inherently difficult since they are created to pass as the original components, often requiring sophisticated equipment and expertise to identify. + +In ML, where decisions are made in real-time and based on complex computations, the consequences of hardware failure are inconvenient and potentially dangerous. Stakeholders in the field of ML need to understand these risks thoroughly. The issues presented by counterfeit hardware necessitate a deep dive into the current challenges facing ML system integrity and emphasize the importance of vigilant, informed management of the hardware life cycle within these advanced systems. + +### Supply Chain Risks + +The threat of counterfeit hardware is closely tied to broader supply chain vulnerabilities. Globalized, interconnected supply chains create multiple opportunities for compromised components to infiltrate a product's lifecycle. Supply chains involve numerous entities from design to manufacturing, assembly, distribution, and integration. A lack of transparency and oversight of each partner makes verifying integrity at every step challenging. Lapses anywhere along the chain can allow the insertion of counterfeit parts. + +For example, a contracted manufacturer may unknowingly receive and incorporate recycled electronic waste containing dangerous counterfeits. An untrustworthy distributor could smuggle in cloned components. Insider threats at any vendor might deliberately mix counterfeits into legitimate shipments. + +Once counterfeits enter the supply stream, they move quickly through multiple hands before ending up in ML systems where detection is difficult. Advanced counterfeits like refurbished parts or clones with repackaged externals can masquerade as authentic components, passing visual inspection. + +Thorough technical profiling using micrography, X-ray screening, component forensics, and functional testing is often required to identify fakes. However, such costly analysis is impractical for large-volume procurement. + +Strategies like supply chain audits, screening suppliers, validating component provenance, and adding tamper-evident protections can help mitigate risks. But ultimately, a zero-trust approach is prudent given global supply chain security challenges. Designing ML systems to utilize redundant checking, fail-safes, and continuous runtime monitoring provides resilience against component compromises. + +Rigorous validation of hardware sources coupled with fault-tolerant system architectures offers the most robust defense against the pervasive risks of convoluted, opaque global supply chains. + +#### Case Study + +In 2018, Bloomberg Businessweek published an alarming [story](https://www.bloomberg.com/news/features/2018-10-04/the-big-hack-how-china-used-a-tiny-chip-to-infiltrate-america-s-top-companies) that got much attention in the tech world. The article claimed that tiny spy chips had been secretly planted on server hardware by Supermicro. Reporters said Chinese state hackers working with Supermicro could sneak these tiny chips onto motherboards during manufacturing. The tiny chips allegedly gave the hackers backdoor access to servers used by over 30 major companies, including Apple and Amazon. + +If true, this would allow hackers to spy on private data or even tamper with systems. But after investigating, Apple and Amazon found no proof such hacked Supermicro hardware existed. Other experts questioned if the Bloomberg article was accurate reporting or not. + +Whether the story is completely true or not is not our concern from a pedagogical viewpoint. However, this incident drew attention to the risks of global supply chains for hardware, especially manufactured in China. When companies outsource and buy hardware components from vendors worldwide, there needs to be more visibility into the process. In this complex global pipeline, there are concerns that counterfeits or tampered hardware could be slipped in somewhere along the way without tech companies realizing it. Companies relying too much on single manufacturers or distributors creates risk. For instance, due to the over reliance on [TSMC](https://www.tsmc.com/english) for semiconductor manufacturing, the US has invested 50 billion dollars into the [CHIPS Act](https://www.whitehouse.gov/briefing-room/statements-releases/2022/08/09/fact-sheet-chips-and-science-act-will-lower-costs-create-jobs-strengthen-supply-chains-and-counter-china/). + +As ML moves into more critical systems, verifying hardware integrity from design through production and delivery is crucial. The reported Supermicro backdoor demonstrated that for ML security, we cannot take global supply chains and manufacturing for granted. We must inspect and validate hardware at every link in the chain. + +## Embedded ML Hardware Security + +### Trusted Execution Environments + +#### About TEE + +A Trusted Execution Environment (TEE) is a secure area within a main processor that provides a high level of security for the execution of code and protection of data. TEEs operate by isolating the execution of sensitive tasks from the rest of the device's operations, thereby creating an environment resistant to attacks from software and hardware vectors. + +#### Benefits + +TEEs are particularly valuable in scenarios where sensitive data must be processed or where the integrity of a system's operations is critical. In the context of ML hardware, TEEs ensure that the ML algorithms and data are protected against tampering and leakage. This is essential because ML models often process private information, trade secrets, or data that could be exploited if exposed. + +For instance, a TEE can protect ML model parameters from being extracted by malicious software on the same device. This protection is vital for privacy and maintaining the integrity of the ML system, ensuring that the models perform as expected and do not provide skewed outputs due to manipulated parameters. [Apple's Secure Enclave](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web), found in iPhones and iPads, is a form of TEE that provides an isolated environment to protect sensitive user data and cryptographic operations. + +In ML systems, TEEs can: + +- Securely perform model training and inference, ensuring that the computation results remain confidential. + +- Protect the confidentiality of input data, like biometric information, used for personal identification or sensitive classification tasks. + +- Secure ML models by preventing reverse engineering, which can protect proprietary information and maintain a competitive advantage. + +- Enable secure updates to ML models, ensuring that updates come from a trusted source and have not been tampered with in transit. + +The importance of TEEs in ML hardware security stems from their ability to protect against external and internal threats, including the following: + +- **Malicious Software:** TEEs can prevent high-privilege malware from accessing sensitive areas of the ML system. + +- **Physical Tampering:** By integrating with hardware security measures, TEEs can protect against physical tampering that attempts to bypass software security. + +- **Side-channel Attacks:** Although not impenetrable, TEEs can mitigate certain side-channel attacks by controlling access to sensitive operations and data patterns. + +#### Mechanics + +The fundamentals of TEEs are as follows. They contain four main parts: + +- **Isolated Execution:** Code within a TEE runs in a separate environment from the device's main operating system. This isolation protects the code from unauthorized access by other applications. + +- **Secure Storage:** TEEs can store cryptographic keys,authentication tokens, and sensitive data securely, preventing access by regular applications running outside the TEE. + +- **Integrity Protection:** TEEs can verify the integrity of code and data, ensuring that they have not been altered before execution or during storage. + +- **Data Encryption:** Data handled within a TEE can be encrypted,making it unreadable to entities without the proper keys, which are also managed within the TEE. + +Here are some examples of trusted execution environments (TEEs) that provide hardware-based security for sensitive applications: + +- **[ARMTrustZone](https://www.arm.com/technologies/trustzone-for-cortex-m):**Creates secure and normal world execution environments isolated using hardware controls. Implemented in many mobile chipsets. + +- **[IntelSGX](https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions.html):**Intel's Software Guard Extensions provide an enclave for code execution that protects against certain software attacks,specifically OS layer attacks. Used to safeguard workloads in the cloud. + +- **[Qualcomm Secure ExecutionEnvironment](https://www.qualcomm.com/products/features/mobile-security-solutions):**Hardware sandbox on Qualcomm chipsets for mobile payment and authentication apps. + +- **[Apple SecureEnclave](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web):**TEE for biometric data and key management on iPhones and iPads.Facilitates mobile payments. + +![Figure 6: System on chip showing secure enclave isolated from the main processor to provide an extra layer of security. The secure enclave has a boot ROM to establish a hardware root of trust, an AES engine for efficient and secure cryptographic operations, and protected memory. The secure enclave has a mechanism to store inromation securely on attached storage seperate from the NAND flash storage used by the application processor and operating system. This design keeps sensitive user data secure even when the Application Processor kernel becomes compromised. Credit: [Apple](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.](images/security_privacy/image1.png) + +#### Trade-Offs + +If TEEs are so good, why don't all systems have TEE enabled by default? The decision to implement a Trusted Execution Environment (TEE) is not taken lightly. There are several reasons why a TEE might not be present in all systems by default. Here are some trade-offs and challenges associated with TEEs: + +**Cost:** Implementing TEEs involves additional costs. There are direct costs for the hardware and indirect costs associated with developing and maintaining secure software for TEEs. These costs may not be justifiable for all devices, especially low-margin products. + +**Complexity:** TEEs add complexity to system design and development. Integrating a TEE with existing systems requires a substantial redesign of the hardware and software stack, which can be a barrier, especially for legacy systems. + +**Performance Overhead:** While TEEs offer enhanced security, they can introduce performance overhead. For example, the additional steps in verifying and encrypting data can slow down system performance, which may be critical in time-sensitive applications. + +**Development Challenges:** Developing for TEEs requires specialized knowledge and often must adhere to strict development protocols. This can extend development time and complicate the debugging and testing processes. + +**Scalability and Flexibility:** TEEs, due to their secure nature, may impose limitations on scalability and flexibility. Upgrading secure components or scaling the system for more users or data can be more challenging when everything must pass through a secure, enclosed environment. + +**Energy Consumption:** The increased processing required for encryption, decryption, and integrity checks can lead to higher energy consumption, a significant concern for battery-powered devices. + +**Market Demand:** Not all markets or applications require the level of security provided by TEEs. For many consumer applications, the perceived risk may be low enough that manufacturers opt not to include TEEs in their designs. + +**Security Certification and Assurance:** Systems with TEEs may need rigorous security certifications with bodies like [Common Criteria](https://www.commoncriteriaportal.org/ccra/index.cfm) (CC) or the [European Union Agency for Cybersecurity](https://www.enisa.europa.eu/) (ENISA), which can be lengthy and expensive. Some organizations may choose not to implement TEEs to avoid these hurdles. + +**Limited Resource Devices:** Devices with limited processing power, memory, or storage may not be capable of supporting TEEs without compromising their primary functionality. + +### Secure Boot + +#### About + +Secure Boot is a security standard that ensures a device boots using only software that is trusted by the Original Equipment Manufacturer (OEM). When the device starts up, the firmware checks the signature of each piece of boot software, including the bootloader, kernel, and base operating system, to ensure it's not tampered with. If the signatures are valid, the device continues to boot. If not, the boot process stops to prevent potential security threats from executing. + +#### Benefits + +The integrity of an ML system is critical from the moment it is powered on. A compromised boot process could undermine the system by allowing malicious software to load before the operating system and ML applications start. This could lead to manipulated ML operations, stolen data, or the device being repurposed for malicious activities such as botnets or crypto-mining. + +Secure Boot helps protect embedded ML hardware in several ways: + +- **Protecting ML Data:** Ensuring that the data used by ML models, which may include private or sensitive information, is not exposed to tampering or theft during the boot process. -- Importance of privacy and security in AI -- Overview of privacy and security challenges in embedded AI -- Significance of user trust and data protection +- **Guarding Model Integrity:** Maintaining the integrity of the ML models themselves, as tampering with the model could lead to incorrect or malicious outcomes. -## Data Privacy in AI Systems +- **Secure Model Updates:** Enabling secure updates to ML models and algorithms, ensuring that updates are authenticated and have not been altered. -Explanation: This section is of utmost importance as it delves into the various ways to protect sensitive data during collection, storage, and processing. Given that AI systems often handle a large amount of personal data, implementing data privacy measures is critical to prevent unauthorized access and misuse. +#### Mechanics -- Data anonymization techniques -- Principles of data minimization -- Legal frameworks governing data privacy +A Trusted Execution Environment (TEE) benefits from Secure Boot in multiple ways. For instance, during initial validation, Secure Boot ensures that the code running inside the TEE is the correct and untampered version approved by the device manufacturer. It can ensure resilience against tampering by verifying the digital signatures of the firmware and other critical components, Secure Boot prevents unauthorized modifications that could undermine the TEE's security properties. Secure Boot establishes a foundation of trust upon which the TEE can securely operate, enabling secure operations such as cryptographic key management, secure processing, and sensitive data handling. -## Encryption Techniques +![Figure 7: The Secure Boot flow of a trusted embedded system. Source: [@Rashmi2018Secure].](images/security_privacy/image4.png) -Explanation: Encryption techniques are pivotal in safeguarding data at rest and during transmission. In this section, we will explore various encryption methodologies and how they can be used effectively in embedded AI systems to ensure data confidentiality and security. +#### Case Study: Apple's Face ID -- Symmetric and asymmetric encryption -- End-to-end encryption -- Encryption protocols and standards +Let's take a real-world example. Apple's Face ID technology uses advanced machine learning algorithms to enable [facial recognition](https://support.apple.com/en-us/102381) on iPhones and iPads. It relies on a sophisticated framework of sensors and software to accurately map the geometry of a user's face. For Face ID to function securely and protect user biometric data, the device's operations must be trustworthy from the moment it is powered on, which is where Secure Boot plays a crucial role. Here's how Secure Boot works in conjunction with Face ID: -## Secure Multi-Party Computation +**Initial Verification:** When an iPhone is powered on, the Secure Boot process begins in the Secure Enclave, a coprocessor that provides an extra layer of security. The Secure Enclave is responsible for processing fingerprint data for Touch ID and facial recognition data for Face ID. The boot process verifies that Apple signs the Secure Enclave's firmware and has not been tampered with. This step ensures that the firmware used to process biometric data is authentic and safe to execute. -Explanation: Secure Multi-Party Computation (SMPC) is a cryptographic protocol that allows for the secure sharing of data between multiple parties. This section is vital as it discusses how SMPC can be used to perform computations on encrypted data without revealing the underlying information, which is a significant stride in preserving privacy in AI systems. +**Continuous Security Checks:** After the initial power-on self-test and verification by Secure Boot, the Secure Enclave communicates with the device's main processor to continue the secure boot chain. It verifies the digital signatures of the iOS kernel and other critical boot components before allowing the boot process to proceed. This chained trust model prevents unauthorized modifications to the bootloader and operating system, which could compromise the device's security. -- Basics of SMPC -- Use cases for SMPC in AI -- Challenges and solutions in implementing SMPC +**Face Data Processing:** Once the device has completed its secure boot sequence, the Secure Enclave can interact with the ML algorithms that power Face ID safely. Facial recognition involves projecting and analyzing over 30,000 invisible dots to create a depth map of the user's face and an infrared image. This data is then converted into a mathematical representation compared with the registered face data securely stored in the Secure Enclave. -## Privacy-Preserving Machine Learning +**Secure Enclave and Data Protection:** The Secure Enclave is designed to protect sensitive data and handle the cryptographic operations that secure it. It ensures that even if the operating system kernel is compromised, the facial data cannot be accessed by unauthorized apps or attackers. Face ID data never leaves the device and is not backed up to iCloud or anywhere else. -Explanation: This section explores the innovative approaches to developing machine learning models that can operate on encrypted data or provide results without revealing sensitive information. Understanding these concepts is fundamental in designing AI systems that respect user privacy and prevent data exploitation. +**Firmware Updates:** Apple frequently releases firmware updates to address security vulnerabilities and improve the functionality of its systems. Secure Boot ensures that each firmware update is authenticated and that only updates signed by Apple are installed on the device, preserving the integrity and security of the Face ID system. -- Differential privacy -- Homomorphic encryption -- Federated learning +By using Secure Boot with dedicated hardware like the Secure Enclave, Apple can provide strong security assurances for sensitive operations like facial recognition. -## Authentication and Authorization +#### Challenges -Explanation: Authentication and authorization mechanisms are essential to control access to sensitive resources within an AI system. This section will highlight various strategies to securely manage and restrict access to various components in an embedded AI environment, ensuring that only authorized entities can interact with the system. +Implementing Secure Boot poses several challenges that must be addressed to realize its full benefits. -- Role-based access control -- Multi-factor authentication -- Secure tokens and API keys +**Key Management Complexity:** Generating, storing, distributing, rotating, and revoking cryptographic keys in a provably secure manner is extremely challenging, yet vital for maintaining the chain of trust. Any compromise of keys cripples protections. Large enterprises managing multitudes of device keys face particular scale challenges. -## Secure Hardware Enclaves +**Performance Overhead:** Checking cryptographic signatures during boot can add 50-100ms or more per component verified. This delay may be prohibitive for time-sensitive or resource-constrained applications. However, performance impacts can be reduced through parallelization and hardware acceleration. -Explanation: This section will dissect how secure hardware enclaves can provide a protected execution environment for critical operations in an embedded AI system. Understanding the role and implementation of hardware enclaves is crucial for building AI systems resistant to both physical and software attacks. +**Signing Burden:** Developers must diligently ensure that all software components involved in the boot process - bootloaders, firmware, OS kernel, drivers, applications, etc. are correctly signed by trusted keys. Accommodating third-party code signing remains an issue. -- Concepts of hardware enclaves -- Hardware security modules (HSMs) -- Trusted execution environments (TEEs) +**Cryptographic Verification:** Secure algorithms and protocols must validate the legitimacy of keys and signatures, avoid tampering or bypass, and support revocation. Accepting dubious keys undermines trust. -## Security Audits and Compliance +**Customizability Constraints:** Vendor-locked Secure Boot architectures limit user control and upgradability. Open-source bootloaders like [u-boot](https://source.denx.de/u-boot/u-boot) and [coreboot](https://www.coreboot.org/) enable security while supporting customizability. -Explanation: Security audits and compliance are vital components to ensure the continual adherence to privacy and security standards. This section is crucial as it discusses the various methods of conducting security audits and the importance of maintaining compliance with established regulatory frameworks. +**Scalable Standards:** Emerging standards like [Device Identifier Composition Engine](https://www.microsoft.com/en-us/research/project/dice-device-identifier-composition-engine/) (DICE) and [IDevID](https://1.ieee802.org/security/802-1ar/) promise to securely provision and manage device identities and keys at scale across ecosystems. -- Security audit methodologies -- Regulatory compliance standards -- Risk assessment and management +Adopting Secure Boot requires following security best practices around key management, crypto validation, signed updates, and access control. Secure Boot provides a robust foundation for building device integrity and trust when implemented with care. + +### Hardware Security Modules + +#### About HSM + +A Hardware Security Module (HSM) is a physical device that manages digital keys for strong authentication and provides crypto-processing. These modules are designed to be tamper-resistant and provide a secure environment for performing cryptographic operations. HSMs can come in standalone devices, plug-in cards, or integrated circuits on another device. + +HSMs are crucial for a range of security-sensitive applications because they offer a hardened, secure enclave for the storage of cryptographic keys and execution of cryptographic functions. They are particularly important for ensuring the security of transactions, identity verifications, and data encryption. + +#### Benefits + +HSMs provide several functionalities that are beneficial for the security of ML systems: + +**Protecting Sensitive Data:** In machine learning applications, models often process sensitive data that can be proprietary or personal. HSMs protect the encryption keys used to secure this data, both at rest and in transit, from exposure or theft. + +**Ensuring Model Integrity:** The integrity of ML models is vital for their reliable operation. HSMs can securely manage the signing and verification processes for ML software and firmware, ensuring unauthorized parties have not altered the models. + +**Secure Model Training and Updates:** The training and updating of ML models involve the processing of potentially sensitive data. HSMs ensure that these processes are conducted within a secure cryptographic boundary, protecting against the exposure of training data and unauthorized model updates. + +#### Trade-offs + +HSMs involve several trade-offs for embedded ML. These trade-offs are somewhat similar to TEEs, but for the sake of completeness, we will also discuss them here through the lens of HSM. + +**Cost:** HSMs are specialized devices that can be expensive to procure and implement, which can raise the overall cost of an ML project. This may be a significant factor to consider for embedded systems where cost constraints are often stricter. + +**Performance Overhead:** While secure, the cryptographic operations performed by HSMs can introduce latency. Any added delay can be a critical issue in high-performance embedded ML applications where inference needs to happen in real-time, such as in autonomous vehicles or real-time translation devices. + +**Physical Space:** Embedded systems are often limited by physical space, and adding an HSM can be challenging in tightly constrained environments. This is especially true for consumer electronics and wearable technology, where size and form factor are key considerations. + +**Power Consumption:** HSMs require power for their operation, which can be a drawback for battery-operated devices that rely on long battery life. The secure processing and cryptographic operations can drain the battery faster, a significant trade-off for mobile or remote embedded ML applications. + +**Complexity in Integration:** Integrating HSMs into existing hardware systems adds complexity. It often requires specialized knowledge to manage the secure communication between the HSM and the system's processor and develop software capable of interfacing with the HSM. + +**Scalability:** Scaling an ML solution that uses HSMs can be challenging. Managing a fleet of HSMs and ensuring uniformity in security practices across devices can become complex and costly when the deployment size increases, especially when dealing with embedded systems where communication is costly. + +**Operational Complexity:** HSMs can make updating firmware and ML models more complex. Every update must be signed and possibly encrypted, which adds steps to the update process and may require secure mechanisms for key management and update distribution. + +**Development and Maintenance:** The secure nature of HSMs means that only limited personnel have access to the HSM for development and maintenance purposes. This can slow down the development process and make routine maintenance more difficult. + +**Certification and Compliance:** Ensuring that an HSM meets specific industry standards and compliance requirements can add to the time and cost of development. This may involve undergoing rigorous certification processes and audits. + +### Physical Unclonable Functions (PUFs) + +#### About + +Physical Unclonable Functions (PUFs) provide a hardware-intrinsic means for cryptographic key generation and device authentication by harnessing the inherent manufacturing variability in semiconductor components. During fabrication, random physical factors such as doping variations, line edge roughness, and dielectric thickness result in microscale differences between semiconductors, even when produced from the same masks. These create detectable timing and power variances that act as a \"fingerprint" unique to each chip. PUFs exploit this phenomenon by incorporating integrated circuits to amplify minute timing or power differences into measurable digital outputs. + +When stimulated with an input challenge, the PUF circuit produces an output response based on the device's intrinsic physical characteristics. Due to their physical uniqueness, the same challenge will yield a different response on other devices. This challenge-response mechanism can be used to generate keys securely and identifiers tied to the specific hardware, perform device authentication, or securely store secrets. For example, a key derived from a PUF will only work on that device and cannot be cloned or extracted even with physical access or full reverse engineering [@Gao2020Physical]. + +#### Benefits + +PUF key generation avoids the need for external key storage which risks exposure. It also provides a foundation for other hardware security primitives like secure boot. Implementation challenges include managing varying reliability and entropy across different PUFs, sensitivity to environmental conditions, and susceptibility to machine learning modeling attacks. When designed carefully, PUFs enable promising applications in IP protection, trusted computing, and anti-counterfeiting. + +#### Utility + +Machine learning models are rapidly becoming a core part of the functionality for many embedded devices like smartphones, smart home assistants, and autonomous drones. However, securing ML on resource-constrained embedded hardware can be challenging. This is where physical unclonable functions (PUFs) come in uniquely handy. Let's look at some examples of how PUFs can be useful. + +PUFs provide a way to generate unique fingerprints and cryptographic keys tied to the physical characteristics of each chip on the device. Let's take an example. We have a smart camera drone that uses embedded ML to track objects. A PUF integrated into the drone's processor could create a device-specific key to encrypt the ML model before loading it onto the drone. This way, even if an attacker somehow hacks the drone and tries to steal the model, they won't be able to use it on another device! + +The same PUF key could also create a digital watermark embedded in the ML model. If that model ever gets leaked and posted online by someone trying to pirate it, the watermark could help prove it came from your stolen drone and didn't originate from the attacker. Also, imagine the drone camera connects to the cloud to offload some of its ML processing. The PUF can authenticate the camera is legitimate before the cloud will run inference on sensitive video feeds. The cloud could verify that the drone has not been physically tampered with by checking that the PUF responses have not changed. + +PUFs enable all this security through their challenge-response behavior's inherent randomness and hardware binding. Without needing to store keys externally, PUFs are ideal for securing embedded ML with limited resources. Thus, they offer a unique advantage over other mechanisms. + +#### Mechanics + +The working principle behind PUFs involves generating a \"challenge-response" pair, where a specific input (the challenge) to the PUF circuit results in an output (the response) that is determined by the unique physical properties of that circuit. This process can be likened to a fingerprinting mechanism for electronic devices. Devices that utilize ML for processing sensor data can employ PUFs to secure communication between devices and prevent the execution of ML models on counterfeit hardware. + +![Figure 8: PUF basics. a) A PUF exploits intrinsic random variation at the microscale or nanoscale. Such random variation resulting from uncontrollable fabrication processes can be conceptually thought as a unique physical 'fingerprint' of a hardware device. b) Optical PUF. Under illumination from a given angle/polarization, complex interference occurs within an inhomogeneous transparent plastic token. Then, a two dimensional (2D) speckle pattern is recorded using a charge-coupled device camera. The angle/polarization is treated as the challenge while the 2D speckle pattern is the response. c) APUF. Consisting of multiple stages (1 to k), the challenge bits (C) select two theoretically identical but practically unequal delay paths at each stage. At the end of the APUF, an arbiter judges whether the top path is faster or not, and reacts with a '1' or '0' response (r; ref. 10). A challenge bit of '0' means two signals pass a given stage in parallel, whilst '1' means two signals cross over. d) SRAM PUF. The mismatch of threshold voltage Vth of the transistors determines the response124. For example, if the is slightly smaller than, at power-up, the transistor M1 starts conducting before M2, thus, the logic state at point A = '1'. This in turn prevents M2 switching on. As a result, the SRAM power-up state prefers to be '1' (point A = '1', point B = '0'), which is the response, while the address of the memory cell is the challenge. WL, word line; BL, bit line. Source: [@Gao2020Physical].](images/security_privacy/image2.png) + +#### Challenges + +There are a few challenges with PUFs. The PUF response can be sensitive to environmental conditions, such as temperature and voltage fluctuations, leading to inconsistent behavior that must be accounted for in the design. Also, since PUFs can potentially generate many unique challenge-response pairs, managing and ensuring the consistency of these pairs across the device's lifetime can be challenging. Last but not least, integrating PUF technology may increase the overall manufacturing cost of a device, although it can save costs in key management over the device's lifecycle. + +## Privacy Concerns in Data Handling + +Handling personal and sensitive data securely and ethically is critical as machine learning permeates devices like smartphones, wearables, and smart home appliances. For medical hardware, handling data securely and ethically is further required by law, through the [Health Insurance Portability and Accountability Act](https://aspe.hhs.gov/report/health-insurance-portability-and-accountability-act-1996) (HIPAA). These embedded ML systems pose unique privacy risks given their intimate proximity to users' lives. + +### Sensitive Data Types + +Embedded ML devices like wearables, smart home assistants, and autonomous vehicles frequently process highly personal data that requires careful handling to maintain user privacy and prevent misuse. Specific examples include medical reports and treatment plans processed by health wearables, private conversations continuously captured by smart home assistants, and detailed driving habits collected by connected cars. Compromise of such sensitive data can lead to serious consequences like identity theft, emotional manipulation, public shaming, and mass surveillance overreach. + +Sensitive data takes many forms - structured records like contact lists and unstructured content like conversational audio and video streams. In medical settings, protected health information (PHI) is collected by doctors throughout every interaction, and is heavily regulated by strict HIPAA guidelines. Even outside of medical settings, sensitive data can still be collected in the form of [Personally Identifiable Information](https://www.dol.gov/general/ppii) (PII), which is defined as "any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means." Examples of PII include email addresses, social security numbers, and phone numbers, among other fields. PII is collected in medical settings, as well as other settings (financial applications, etc) and is heavily regulated by Department of Labor policies. + +Even derived model outputs could indirectly leak details about individuals. Beyond just personal data, proprietary algorithms and datasets also warrant confidentiality protections. In the Data Engineering section, we covered several of these topics in detail. + +Techniques like de-identification, aggregation, anonymization, and federation can help transform sensitive data into less risky forms while retaining analytical utility. However, diligent controls around access, encryption, auditing, consent, minimization, and compliance practices are still essential throughout the data lifecycle. Regulations like [GDPR](https://gdpr-info.eu/) categorize different classes of sensitive data and prescribe responsibilities around their ethical handling. Standards like [NIST 800-53](https://csrc.nist.gov/pubs/sp/800/53/r5/upd1/final) provide rigorous security control guidance tailored for confidentiality protection. With growing reliance on embedded ML, understanding sensitive data risks is crucial. + +### Applicable Regulations + +Many embedded ML applications handle sensitive user data under HIPAA, GDPR, and CCPA regulations. Understanding the protections mandated by these laws is crucial for building compliant systems. + +- [HIPAA](https://www.hhs.gov/hipaa/for-professionals/privacy/index.html#:~:text=The HIPAA Privacy Rule establishes,care providers that conduct certain)governs medical data privacy and security in the US, with severe penalties for violations. Any health-related embedded ML devices like diagnostic wearables or assistive robots would need to implement controls like audit trails, access controls, and encryption prescribed by HIPAA. + + +- [GDPR](https://gdpr-info.eu/) imposes transparency,retention limits, and user rights around EU citizen data, even when processed by companies outside the EU. Smart home systems capturing family conversations or location patterns would needGDPR compliance. Key requirements include data minimization,encryption, and mechanisms for consent and erasure. + + +- [CCPA](https://oag.ca.gov/privacy/ccpa#:~:text=The CCPA applies to for,, households, or devices; or)in California focuses on protecting consumer data privacy through provisions like required disclosures and opt-out rights. IoT gadgets like smart speakers and fitness trackers used byCalifornians would likely fall under its scope. + +- CCPA was the first state specific set of regulations surrounding privacy concerns. Following the CCPA, similar regulations were also enacted in [10 other states](https://pro.bloomberglaw.com/brief/state-privacy-legislation-tracker/), with some states proposing bills for consumer data privacy protections. + +Additionally, when relevant to the application, sector-specific rules govern telematics, financial services, utilities, etc. Best practices like privacy by design, impact assessments, and maintaining audit trails help embed compliance, if it is not already required by law. Given potentially costly penalties, consulting legal/compliance teams is advisable when developing regulated embedded ML systems. + +### De-identification + +If medical data is de-identified thoroughly, HIPAA guidelines do not directly apply and regulations are far fewer. However, medical data needs to be de-identified using [HIPAA methods](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html) (Safe Harbor methods or Expert Determination methods) in order for HIPAA guidelines to no longer apply. + +#### Safe Harbor Methods + +Safe Harbor methods are most commonly used for de-identifying protected healthcare information, due to the limited resources needed in comparison to Expert Determination methods. Safe Harbor de-identification requires datasets to be scrubbed of any data that falls into one of 18 categories. The following categories are listed as sensitive information based on the Safe Harbor standard: + +- Name, Geographic locator, Birthdate, Phone Number, Email Address, IPAddresses, Social Security Numbers, Medical Record Numbers, HealthPlan Beneficiary Numbers, Device Identifiers and Serial Numbers,Certificate/License Numbers (Birth Certificate, Drivers License,etc), Account Numbers, Vehicle Identifiers, Website URLs, FullFace Photos and Comparable Images, Biometric Identifiers, Any other unique identifiers + +For a majority of these categories, all data is required to be removed regardless of the circumstances. For other categories, including geographical information and birthdate, the data can be partially removed enough to make the information hard to re-identify. For example, if a zip code is large enough, the first 3 digits of the zipcode can still remain, since there are enough people in the geographic area to make re-identification difficult. Birthdates need to be scrubbed of all elements except for birth year, and all ages above 89 years old need to be aggregated into a 90+ category. + +#### Expert Determination Methods + +Safe Harbor methods work for several cases of medical data de-identification, though in some cases, re-identification is still possible. For example, let's say you collect data on a patient in an urban city with a large zip code, but you have documented a rare disease that they have -- a disease which only 25 people have in the entire city. Given geographic data coupled with birth year, it is highly possible that someone can re-identify this individual, which is an extremely detrimental privacy breach. + +In unique cases like these, expert determination methods of de-identification of data are preferred. Expert determination de-identification requires a "person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable" to evaluate a dataset and determine if the risk of re-identification of individual data in a given dataset, in combination with publicly available data (voting records, etc), is extremely small. + +Expert Determination de-identification is understandably harder to complete than Safe Harbour de-identification due to the cost and feasibility of accessing an expert to verify the likelihood of re-identifying a dataset. However, in many cases, expert determination is required to ensure that re-identification of data is extremely unlikely. + +### Data Minimization + +Data minimization involves collecting, retaining, and processing only the necessary user data to reduce privacy risks from embedded ML systems. This starts by restricting the data types and instances gathered to the bare minimum required for the system's core functionality. For example, an object detection model only collects the images needed for that specific computer vision task. Similarly, a voice assistant would limit audio capture to specific spoken commands rather than persistently recording ambient sounds. + +Where possible, ephemeral data that briefly resides in memory without persisting storage provides additional minimization. A clear legal basis like user consent should be established for any collection and retention. Sandboxing and access controls prevent unauthorized use beyond intended tasks. Retention periods should be defined based on purpose, with secure deletion procedures removing expired data. + +Data minimization can be broken down into [3 categories](https://dl.acm.org/doi/pdf/10.1145/3397271.3401034?casa_token=NrOifKo6dPMAAAAA:Gl5NZNpZMiuSRpJblj43c1cNXkXyv7oEOuYlOfX2qvT8e-9mOLoLQQYz29itxVh6xakKm8haWRs): + +1. "Data must be *adequate* in relation to the purpose that is pursued." Data omission can limit the accuracy of models trained on the data, and any general usefulness of a dataset. Dataminimization requires a minimum amount of data to be collected from users, while still creating a dataset that adds value to others. + +2. The data collected from users must be *relevant* to the purpose of the data collection. + +3. The data collected from users should be *limited* to only the data that is absolutely necessary from users in order to fulfill the purpose of the initial data collection. If similarly robust and accurate results can be obtained from a smaller dataset, any additional data beyond this smaller dataset is not necessary and should not be collected. + +Emerging techniques like differential privacy, federated learning, and synthetic data generation allow for deriving useful insights from less raw user data. Performing data flow mapping and impact assessments help identify opportunities to minimize raw data usage. + +Methodologies like Privacy by Design [@cavoukian2009privacy] consider such minimization early in system architecture. Regulations like GDPR also mandate data minimization principles. With a multilayered approach across legal, technical, and process realms, data minimization limits risks in embedded ML products. + +#### Case Study - Performance Based Data Minimization + +Performance based data minimization [@Biega2020Oper] focuses on expanding upon the third category of data minimization mentioned above, namely *limitation*. It specifically defines the robustness of model results on a given dataset by certain performance metrics, such that data should not be additionally collected if it does not significantly improve performance. Performance metrics can be divided into two categories: + +1. Global data minimization performance + +a. Satisfied if a dataset minimizes the amount of per-user data while its mean performance across all data is comparable to the mean performance of the original, unminimized dataset. + +2. Per user data minimization performance + +a. Satisfied if a dataset minimizes the amount of per-user data while the minimum performance of individual user data is comparable to the minimum performance of individual user data in the original, unminimized dataset. + +Performance based data minimization can be leveraged in several machine learning settings, including recommendation algorithms of movies and in e-commerce settings. For different recommendation algorithms, global data minimization comparisons are shown in Figure 9 + +![Figure 9: Sorted RMSE (a, c) and NDCG (b, d) values for all users when selecting random subsets of items of varying sizes as input to the kNN (a, b) and SVD (c, d) recommendation algorithms. Higher values on the y-axis in plots (a, c) are worse, while higher values on the y-axis in plots (b, d) are better. SVD is more robust to minimization than kNN, with aggressive minimization incurring low quality loss. While error increases as we minimize, the distribution of remains the same.](images/security_privacy/image12.png) + +This in in comparison to per-user data minimization, in Figure 10: + +![Figure 10: RMSE (a) and NDCG (b) variation over the population of users when selecting random subset of items of varying sizes as an input to the kNN algorithm. The underlying data presented here is the same as in Figure 1, but the data points are sorted by the y-axis value of the Full strategy only. Data points of other selection methods are unsorted and match the users at the ranking positions defined by the sorting of the Full strategy. This result shows that, while the overall quality loss is low and the error distribution remains the same, the quality loss for individuals can be substantial.](images/security_privacy/image11.png) + +Global data minimization seems to be a much more feasible method of data minimization compared to per-user data minimization, given the much more significant difference in per-user losses between the minimized dataset and original dataset. + +### Consent and Transparency + +Meaningful consent and transparency are crucial when collecting user data for embedded ML products like smart speakers, wearables, and autonomous vehicles. When first setup, ideally, the device should clearly explain what data types are gathered, for what purposes, how they are processed, and retention policies. For example, a smart speaker might collect voice samples to train speech recognition and personalized voice profiles. During use, reminders and dashboard options give ongoing transparency into how data is handled, such as weekly digests of voice snippets captured. Control options allow revoking or limiting consent, like disabling storage of voice profiles. + +Consent flows should provide granular controls beyond just binary yes/no choices. For instance, users could selectively consent to certain data uses like training speech recognition but not personalization. Focus groups and usability testing with target users shape consent interfaces and wording of privacy policies to optimize comprehension and control. Respecting user rights like data deletion and rectification demonstrates trustworthiness. Vague legal jargon hampers transparency. Regulations like GDPR and CCPA reinforce consent requirements. Thoughtful consent and transparency provide users agency over their data while building trust in embedded ML products through open communication and control. + +### Privacy Concerns in Machine Learning + +#### Generative AI + +With the rise of public use of generative AI models, including OpenAI's GPT4 and other LLMs, privacy and security concerns have also risen. ChatGPT in particular has been discussed more recently in relation to privacy, given all the personal information collected from ChatGPT users. In June, [a class action lawsuit](https://assets.bwbx.io/documents/users/iqjWHBFdfxIU/rIZH4FXwShJE/v0) was filed against ChatGPT due to concerns that it was trained on proprietary medical and personal information without proper permissions or consent. As a result of these privacy concerns, [many companies](https://www.businessinsider.com/chatgpt-companies-issued-bans-restrictions-openai-ai-amazon-apple-2023-7) have prohibited their employees from accessing ChatGPT, and uploading private, company related information to the chatbot. Further, ChatGPT has been shown to be susceptible to prompt injection attacks and other security attacks that could compromise the privacy of the proprietary data it was trained upon. + +##### Case Study + +While ChatGPT has instituted protections to prevent people from accessing private and ethically questionable information, several individuals have successfully been able to bypass these protections through prompt injection attacks, and other security attacks. As demonstrated in Figure 11, users have been able to bypass ChatGPT protections to mimic the tone of a "deceased grandmother" to learn how to bypass a web application firewall [@Gupta2023ChatGPT]. + +![Figure 11: Grandma role play. A user is able to leverage a "role play" tactic, encouraging ChatGPT to mimic the tone of a "deceased grandmother", to bypass ChatGPT protections and learn how to bypass a web application firewall.](images/security_privacy/image6.png) + +Further, users have also successfully been able to use reverse psychology to manipulate ChatGPT and access information initially prohibited by the model. In Figure 12, a user is initially prevented from learning about piracy websites through ChatGPT, but is easily able to bypass these restrictions using reverse psychology. + +![Figure 12: By using reverse psychology to bypass ChatGPT safety restrictions, a use is easily able to access prohibited information regarding piracy websites.](images/security_privacy/image10.png) + +The ease at which ChatGPT can be manipulated by security attacks is concerning given the private information it was trained upon without consent. Further research on data privacy in LLMs and generative AI should focus on preventing the model from being so naive to prompt injection attacks. + +#### Data Erasure + +Many of the previous regulations mentioned above, including GDPR, include a "right to be forgotten" clause. This [clause](https://gdpr-info.eu/art-17-gdpr/) essentially states that "the data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay." However, in several cases, even if user data has been erased from a platform, the data is not completely erased if a machine learning model has been trained on this data for separate purposes. Through methods similar to membership inference attacks, other individuals can still predict the training data that a model was trained upon even if the data's presence was explicitly removed online. + +One approach to addressing privacy concerns with machine learning training data has been through differential privacy methods. Through the addition of Laplacian noise in the training set, for example, a model can be robust to membership inference attacks, preventing deleted data from being recovered. Another approach to preventing deleted data from being inferred from security attacks is also by simply retraining the model from scratch on the remaining data. Since this process is time consuming and computationally expensive, other researchers have attempted to address privacy concerns surrounding inferring model training data through a process called machine unlearning, in which a model actively iterates on itself to remove the influence of "forgotten" data that it might have been trained on, as mentioned below. + +## Privacy-Preserving ML Techniques + +A myriad of techniques have been developed to preserve privacy, each addressing different aspects and challenges of data security. These methods can be broadly categorized into several key areas: +**Differential Privacy**, which focuses on statistical privacy in data outputs; **Federated Learning**, emphasizing decentralized data processing; **Homomorphic Encryption and Secure Multi-party Computation (SMC)**, both enabling secure computations on encrypted or private data; +**Data Anonymization** and **Data Masking and Obfuscation**, which alter data to protect individual identities; **Private Set Intersection** and +**Zero-Knowledge Proofs**, facilitating secure data comparisons and validations; **Decentralized Identifiers (DIDs)** for self-sovereign digital identities; **Privacy-Preserving Record Linkage (PPRL)**, linking data across sources without exposure; **Synthetic Data Generation**, creating artificial datasets for safe analysis; and +**Adversarial Learning Techniques**, enhancing data or model resistance to privacy attacks. + +Given the extensive range of these techniques, it is not feasible to delve into each in depth within a single course or discussion, let alone for any one person to know it all in its glorious detail. Therefore, we will focus on exploring a few specific techniques in relative detail, providing a deeper understanding of their principles, applications, and the unique privacy challenges they address in machine learning. This focused approach will allow us to have a more comprehensive and practical understanding of key privacy-preserving methods in the context of modern ML systems. + +### Differential Privacy + +#### Core Idea + +Differential Privacy is a framework for quantifying and managing the privacy of individuals in a dataset [@Dwork2006Theory]. It provides a mathematical guarantee that the privacy of individuals in the dataset will not be compromised, regardless of any additional knowledge an attacker may possess. The core idea of differential privacy is that the outcome of any analysis (like a statistical query) should be essentially the same, whether any individual's data is included in the dataset or not. This means that by observing the result of the analysis, one cannot determine whether any individual's data was used in the computation. + +For example, let's say a database contains medical records for 10 patients. We want to release statistics about the prevalence of diabetes in this sample without revealing any one patient's condition. To do this, we could add a small amount of random noise to the true count before releasing it. If the true number of diabetes patients is 6, we might add noise from a Laplace distribution to randomly output 5, 6, or 7 each with some probability. An observer now can't tell if any single patient has diabetes based only on the noisy output. The query result looks similar whether each patient's data is included or excluded. This is differential privacy. More formally, a randomized algorithm satisfies ε-differential privacy if for any two neighbor databases D and Dʹ differing by only one entry, the probability of any outcome changes by at most a factor of ε. A lower ε provides stronger privacy guarantees. + +The Laplace Mechanism is one of the most straightforward and commonly used methods to achieve differential privacy. It involves adding noise that follows a Laplace distribution to the data or query results. Apart from the Laplace Mechanism, the general principle of adding noise is central to differential privacy. The idea is to add random noise to the data or the results of a query. The noise is calibrated to ensure that it provides the necessary privacy guarantee while keeping the data useful. + +While the Laplace distribution is common, other distributions like Gaussian can also be used. Laplace noise is used for strict ε-differential privacy for low-sensitivity queries while Gaussian distributions can be used when privacy does not need to be guaranteed, which is known as (ϵ, 𝛿)-differential privacy. In this relaxed version of differential privacy, epsilon and delta are parameters that define the amount of privacy guarantee when releasing information or a model related to a dataset. Epsilon sets a bound on how much information can be learned about the data based on the output while delta allows for a small probability of the privacy guarantee to be violated. The choice between Laplace, Gaussian, and other distributions will depend on the specific requirements of the query and the dataset and the trade-off between privacy and accuracy. + +To illustrate the trade-off of privacy and accuracy in (ϵ, 𝛿)-differential privacy, the following graphs in Figure 13 show the results on accuracy for different noise levels on the MNIST dataset, a large dataset of handwritten digits [@abadi2016deep]. An increasing delta value relaxes the privacy guarantee, so the noise level can be reduced. Since the data will retain many of its original characteristics, accuracy simultaneously increases with drawbacks on privacy preservation. This trade-off is fundamental to differential privacy. + +![Figure 13: Tradeoffs between privacy (as represented by 𝛿) and accuracy throughout the training process for models trained on data containing low, medium, and high levels of noise.](images/security_privacy/image8.png) + +The key points to remember about differential privacy is the following: + +- **Adding Noise:** The fundamental technique in differential privacy is adding controlled random noise to the data or query results.This noise masks the contribution of individual data points. + + +- **Balancing Act:** There's a balance between privacy and accuracy.More noise (lower ϵ) in the data means higher privacy but less accuracy in the model's results. + + +- **Universality:** Differential privacy doesn't rely on assumptions about what an attacker knows. This makes it robust against re-identification attacks, where an attacker tries to uncover individual data. + + +- **Applicability:** It's applicable to various types of data and queries, making it a versatile tool for privacy-preserving data analysis. + +#### Trade-offs + +There are several trade-offs to make with differential privacy, as is the case with any algorithm. But let's focus on the computational specific trade-offs since we care about ML systems. There are some key computational considerations and tradeoffs when implementing differential privacy in a machine learning system: + +**Noise generation:** Implementing differential privacy introduces several important computational tradeoffs compared to standard machine learning techniques. One major consideration is the need to securely generate random noise from distributions like Laplace or Gaussian that get added to query results and model outputs. High-quality cryptographic random number generation can be computationally expensive. + +**Sensitivity analysis:** Another key requirement is rigorously tracking the sensitivity of the underlying algorithms to single data points getting added or removed. This global sensitivity analysis is required to properly calibrate the noise levels. However, for complex model training procedures and data pipelines, analyzing worst-case sensitivity can substantially increase computational complexity. + +**Privacy budget management:** Managing the privacy loss budget across multiple queries and learning iterations is another bookkeeping overhead. The system needs to keep track of cumulative privacy costs and compose them to reason about overall privacy guarantees. This adds computational burden beyond just running queries or training models. + +**Batch vs online tradeoffs:** For online learning systems with continuous high-volume queries, differentially private algorithms require new mechanisms to maintain utility and prevent too much accumulated privacy loss since each query has the potential to alter the privacy budget. Batch offline processing is simpler from a computational perspective as it processes data in large batches where each batch is treated as a single query. High-dimensional sparse data also increases sensitivity analysis challenges. + +**Distributed training:** When training models using [distributed](./training.qmd) or [federated](./optimizations) approaches, new cryptographic protocols are needed to track and bound privacy leakage across nodes. Secure multi-party computation with encrypted data for differential privacy also adds substantial computational load. + +While differential privacy provides strong formal privacy guarantees, implementing it rigorously requires additions and modifications to the machine learning pipeline that come at a computational cost. Managing these overheads while preserving model accuracy remains an active research area. + +#### Case Study + +[Apple's implementation of differential privacy](https://machinelearning.apple.com/research/learning-with-privacy-at-scale#DMNS06) in iOS and MacOS provides a prominent real-world example of [how differential privacy can be deployed at large scale](https://docs-assets.developer.apple.com/ml-research/papers/learning-with-privacy-at-scale.pdf). Apple wanted to collect aggregated usage statistics across their ecosystem to improve products and services, but aimed to do so without compromising individual user privacy. + +To achieve this, they implemented differential privacy techniques directly on user devices to anonymize data points before they are sent to Apple servers. Specifically, Apple uses the Laplace mechanism to inject carefully calibrated random noise. For example, if a user's location history contained [Work, Home, Work, Gym, Work, Home], the differentially private version might replace the exact locations with a noisy sample like [Gym, Home, Work, Work, Home, Work]. + +Apple tunes the Laplace noise distribution to provide a high level of privacy while still preserving utility of the aggregated statistics. Increasing noise levels provides stronger privacy guarantees (lower ε values in DP terminology), but can reduce data utility. Apple's privacy engineers empirically optimized this tradeoff based on their product goals. + +By aggregating hundreds of millions of noisy data points from devices, Apple obtains high fidelity aggregated statistics. For instance, they can analyze features used in new iOS apps while provably masking any one user's app behaviors. On-device computation avoids sending raw data to Apple servers. + +The system uses hardware-based secure random number generation to efficiently sample from the Laplace distribution on devices. Apple also had to optimize their differentially private algorithms and pipeline to operate under the computational constraints of consumer hardware. + +Multiple third-party audits have verified that Apple's system provides rigorous differential privacy protections in line with their stated policies. Of course, assumptions around composition over time and potential re-identification risks still apply. But overall, Apple's deployment shows how differential privacy can be realized in large real-world products when backed by sufficient engineering resources. + +### Federated Learning + +#### Core Idea + +Federated Learning (FL) is a type of machine learning where the process of building a model is distributed across multiple devices or servers, while keeping the training data localized. It was previously discussed in the [Model Optimizations](./optimizations.qmd) chapter, but we will recap it here briefly for the purposes of completion and focus on things that pertain to this chapter. + +FL aims to train machine learning models across decentralized networks of devices or systems while keeping all training data localized. In FL, each participating device leverages its local data to calculate model updates which are then aggregated to build an improved global model. However, the raw training data itself is never directly shared, transferred, or compiled together. This privacy-preserving approach allows jointly developing ML models without centralizing the potentially sensitive training data in one place. + +![Figure 14: The FL lifecycle from [@MAL-083]. The training data always remains on the client data, the model repeatedly is sent back and forth between individual devices and server for local updates and compiling the global model, respectively.](images/security_privacy/image7.png) + +One of the most common model aggregation algorithms is Federated Averaging (FedAvg) where the global model is created by averaging all of the parameters from local parameters. While FedAvg works well with independent and identically distributed data (IID), alternate algorithms like Federated Proximal (FedProx) are crucial in real-world applications where data is often non-IID. FedProx is designed for the FL process when there is significant heterogeneity in the client updates due to diverse data distributions across devices, computational capabilities, or varied amounts of data. + +By leaving the raw data distributed and exchanging only temporary model updates, federated learning provides a more secure and privacy-enhancing alternative to traditional centralized machine learning pipelines. This allows organizations and users to collaboratively benefit from shared models while maintaining control and ownership over their sensitive data. The decentralized nature of FL also makes it robust to single points of failure. + +Imagine a group of hospitals that want to collaborate on a study to predict patient outcomes based on their symptoms. However, due to privacy concerns and regulations like HIPAA, they cannot share their patient data with each other. Here's how Federated Learning can help. + +- **Local Training:** Each hospital trains a machine learning model on its own patient data. This training happens locally, meaning thedata never leaves the hospital's servers. + +- **Model Sharing:** After training, each hospital only sends the model (specifically, the parameters or weights of the model) to acentral server. They do not send any patient data. + +- **Aggregating Models:** The central server aggregates these models from all hospitals into a single, more robust model. This process typically involves averaging the model parameters. + +- **Benefit:** The end result is a machine learning model that has learned from a wide range of patient data without any of that sensitive data having to be shared or leave its original location. + +#### Trade-offs + +There are several system performance-related aspects of FL in machine learning systems. It would be wise to understand these trade-offs because there is no "free lunch" for preserving privacy through FL [@Li2020Federated]. + +**Communication Overhead and Network Constraints:** In FL, one of the most significant challenges is managing the communication overhead. This involves the frequent transmission of model updates between a central server and numerous client devices, which can be bandwidth-intensive. The total number of communication rounds and the size of transmitted messages per round needs to be reduced in order to further minimize communication. Especially in scenarios with a large number of participants, this can lead to substantial network traffic. Additionally, latency becomes a critical factor --- the time taken for these updates to be sent, aggregated, and redistributed can introduce delays. This not only affects the overall training time but also impacts the responsiveness and real-time capabilities of the system. Efficiently managing this communication while minimizing bandwidth usage and latency is crucial for the practical implementation of FL. + +**Computational Load on Local Devices:** FL relies on client devices (like smartphones or IoT devices, which especially matters in TinyML) for model training, which often have limited computational power and battery life. Running complex machine learning algorithms locally can strain these resources, leading to potential performance issues. Moreover, the capabilities of these devices can vary significantly, resulting in uneven contributions to the model training process. Some devices might process updates faster and more efficiently than others, leading to disparities in the learning process. Balancing the computational load to ensure consistent participation and efficiency across all devices is a key challenge in FL. + +**Model Training Efficiency:** The decentralized nature of FL can impact the efficiency of model training. Achieving convergence, where the model no longer significantly improves, can be slower in FL compared to centralized training methods. This is particularly true in cases where the data is non-IID (non-independent and identically distributed) across devices. Additionally, the algorithms used for aggregating model updates play a critical role in the training process. Their efficiency directly affects the speed and effectiveness of learning. Developing and implementing algorithms that can handle the complexities of FL while ensuring timely convergence is essential for the system's performance. + +**Scalability Challenges:** Scalability is a significant concern in FL, especially as the number of participating devices increases. Managing and coordinating model updates from a large number of devices adds complexity and can strain the system. Ensuring that the system architecture can efficiently handle this increased load without degrading performance is crucial. This involves not just handling the computational and communication aspects but also maintaining the quality and consistency of the model as the scale of the operation grows. Designing FL systems that can scale effectively while maintaining performance is a key challenge. + +**Data Synchronization and Consistency:** Ensuring data synchronization and maintaining model consistency across all participating devices in FL is challenging. In environments with intermittent connectivity or devices that go offline periodically, keeping all devices synchronized with the latest model version can be difficult. Furthermore, maintaining consistency in the learned model, especially when dealing with a wide range of devices with different data distributions and update frequencies, is crucial. This requires sophisticated synchronization and aggregation strategies to ensure that the final model accurately reflects the learnings from all devices. + +**Energy Consumption:** The energy consumption of client devices in FL is a critical factor, particularly for battery-powered devices like smartphones and other TinyML/IoT devices. The computational demands of training models locally can lead to significant battery drain, which might discourage continuous participation in the FL process. Balancing the computational requirements of model training with energy efficiency is essential. This involves optimizing algorithms and training processes to reduce energy consumption while still achieving effective learning outcomes. Ensuring energy-efficient operation is key to user acceptance and the sustainability of FL systems. + +#### Case Studies + +Here are a couple of real-world case studies that can illustrate the use of federated learning: + +##### Google Gboard + +Google uses federated learning to improve predictions on its Gboard mobile keyboard app. The app runs a federated learning algorithm on users' devices to learn from their local usage patterns and text predictions while keeping user data private. The model updates are aggregated in the cloud to produce an enhanced global model. This allows providing next-word prediction personalized to each user's typing style, while avoiding directly collecting sensitive typing data. Google reported the federated learning approach reduced prediction errors by 25% compared to baseline while preserving privacy. + +##### Healthcare Research + +The UK Biobank and American College of Cardiology combined datasets to train a model for heart arrhythmia detection using federated learning. The datasets could not be combined directly due to legal and privacy restrictions. Federated learning allowed collaborative model development without sharing protected health data, with only model updates exchanged between the parties. This improved model accuracy as it could leverage a wider diversity of training data while meeting regulatory requirements. + +##### Financial Services + +Banks are exploring using federated learning for anti-money laundering (AML) detection models. Multiple banks could jointly improve AML Models without having to share confidential customer transaction data with competitors or third parties. Only the model updates need to be aggregated rather than raw transaction data. This allows access to richer training data from diverse sources while avoiding regulatory and confidentiality issues around sharing sensitive financial customer data. + +These examples demonstrate how federated learning provides tangible privacy benefits and enables collaborative ML in settings where direct data sharing is not possible. + +### Machine Unlearning + +#### Core Idea + +Machine unlearning is a fairly new process, describing the methods in which the influence of a subset of training data can be removed from the model. There are several methods that have been used to perform machine unlearning and remove the influence of a subset of training data from the final model. A baseline approach might consist of simply fine tuning the model for more epochs on just the data that should be remembered, in order to decrease the influence of the data that should be "forgotten" by the model. Since this approach doesn't explicitly remove the influence of data that should be erased, membership inference attacks are still possible, so researchers have adopted other approaches to explicitly unlearn data from a model. One type of approach that researchers have adopted includes adjusting the model loss function to explicitly treat the losses of the "forget set" (data to be unlearned) and the "retain set" (remaining data that should still be remembered) differently [@tarun2023deep; @khan2021knowledgeadaptation]. + +#### Case Study + +Some researchers demonstrate a real life example of machine unlearning approaches applied to SOTA machine learning models through training an LLM, LLaMA2-7b, to unlearn any references to Harry Potter [@eldan2023whos]. Though this model took 184K GPU-hours to pretrain, it only took 1 GPU hour of fine tuning to erase the model's ability to generate or recall Harry Potter-related content, without noticeably compromising the accuracy of generating content unrelated to Harry Potter. Table 1 demonstrates how the model output changes before and after unlearning has occurred. + +![Table 1: Examples of Harry Potter related prompts, and resulting model outputs from the original LLaMA-7b and the fine-tuned LLaMA-7b that has unlearned references to Harry Potter. The original Llama-7b outputs contain information specific to Harry Potter, while the finetuned LLaMA-7b outputs generic information that does not demonstrate knowledge of Harry Potter.](images/security_privacy/image13.png) + +#### Other Uses + +##### Removing adversarial data + +Deep learning models have previously been shown to be vulnerable to adversarial attacks, in which the attacker generates adversarial data similar to the original training data, to the point where a human cannot tell the difference between the real and fabricated data. The adversarial data results in the model outputting incorrect predictions, which could have detrimental consequences in various applications, including healthcare diagnoses predictions. Machine unlearning has been used to [unlearn the influence of adversarial data](https://arxiv.org/pdf/2209.02299.pdf) to prevent these incorrect predictions from occurring and causing any harm + +### Homomorphic Encryption + +#### Core Idea + +Homomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, generating an encrypted result that, when decrypted, matches the result of operations performed on the plaintext. For example, multiplying two numbers encrypted with homomorphic encryption produces an encrypted product that decrypts the actual product of the two numbers. This means that data can be processed in an encrypted form, and only the resulting output needs to be decrypted, significantly enhancing data security, especially for sensitive information. + +Homomorphic encryption enables outsourced computation on encrypted data without exposing the data itself to the external party performing the operations. However, only certain computations like addition and multiplication are supported in partially homomorphic schemes. Fully homomorphic encryption (FHE) that can handle any computation is even more complex. The number of possible operations is limited before noise accumulation corrupts the ciphertext. + +To use homomorphic encryption across different entities, carefully generated public keys need to be exchanged to carry out operations across separately encrypted data. This advanced encryption technique enables previously impossible secure computation paradigms but requires expertise to implement correctly for real-world systems. + +#### Benefits + +Homomorphic encryption enables machine learning model training and inference on encrypted data, ensuring that sensitive inputs and intermediate values remain confidential. This is critical in healthcare, finance, genetics, and other domains increasingly relying on ML to analyze sensitive and regulated data sets containing billions of personal records. + +Homomorphic encryption thwarts attacks like model extraction and membership inference that could expose private data used in ML workflows. It provides an alternative to trusted execution environments using hardware enclaves for confidential computing. However, current schemes have high computational overheads and algorithmic limitations that constrain real-world applications. + +Homomorphic encryption realizes the decades-old vision of secure multiparty computation by allowing computation on ciphertexts. After being conceptualized in the 1970s, the first fully homomorphic crypto systems emerged in 2009, enabling arbitrary computations. Ongoing research is making these techniques more efficient and practical. + +Homomorphic encryption shows great promise in enabling privacy-preserving machine learning under emerging data regulations. However given constraints, one should carefully evaluate its applicability against other confidential computing approaches. Extensive resources exist to explore homomorphic encryption and track progress in easing adoption barriers. + +#### Mechanics + +1. **Data Encryption:** Before data is processed or sent to a ML model, it is encrypted using a homomorphic encryption scheme and public key. For example, encrypting numbers $x$ and $y$ generates ciphertexts $E(x)$ and $E(y)$. + +2. **Computation on Ciphertext:** The ML algorithm processes the encrypted data directly. For instance, multiplying the ciphertexts $E(x)$ and $E(y)$ generates $E(xy)$. More complex model training can also be done on ciphertexts. + +3. **Result Encryption:** The result $E(xy)$ remains encrypted and can only be decrypted by someone with the corresponding private key to reveal the actual product $xy$. + +Only authorized parties with the private key can decrypt the final outputs, protecting the intermediate state. However, noise accumulates with each operation eventually preventing further computation without decryption. + +Beyond healthcare, homomorphic encryption enables confidential computing for applications like financial fraud detection, insurance analytics, genetics research, and more. It offers an alternative to techniques like multi-party computation and TEEs. Ongoing research aims to improve the efficiency and capabilities. + +Tools like HElib, SEAL, and TensorFlow HE provide libraries to explore implementing homomorphic encryption for real-world machine learning pipelines. + +#### Trade-offs + +For many real-time and embedded applications, fully homomorphic encryption remains impractical for the following reasons. + +**Computational Overhead:** Homomorphic encryption imposes very high computational overheads, often resulting in slowdowns of over 100x for real-world ML applications. This makes it impractical for many time-sensitive or resource-constrained uses. Optimized hardware and parallelization can help but not eliminate this issue. + +**Complexity of Implementation:** The sophisticated algorithms require deep expertise in cryptography to implement correctly. Nuances like format compatibility with floating point ML models and scalable key management pose hurdles. This complexity hinders widespread practical adoption. + +**Algorithmic Limitations:** Current schemes restrict the functions and depth of computations supported, limiting the models and data volumes that can be processed. Ongoing research is pushing these boundaries but restrictions remain. + +**Hardware Acceleration:** To be feasible, homomorphic encryption requires specialized hardware like secure processors or co-processors with trusted execution environments. This adds design and infrastructure costs. + +**Hybrid Designs:** Rather than encrypting entire workflows, selective application of homomorphic encryption to critical subcomponents can achieve protection while minimizing overheads. + +### Secure Multi-Party Communication + +#### Core Idea + +The overarching goal of MPC is to enable different parties to jointly compute a function over their inputs while keeping those inputs private. For example, two organizations may want to collaborate on training a machine learning model by combining their respective data sets, but cannot directly reveal that data to each other due to privacy or confidentiality constraints. MPC aims to provide protocols and techniques that allow them to achieve the benefits of pooled data for model accuracy, without compromising the privacy of each organization's sensitive data. + +At a high level, MPC works by carefully splitting up the computation into separate parts that can be executed independently by each party using their own private input. The results are then combined in a manner that reveals only the final output of the function and nothing about the intermediate values. Cryptographic techniques are used to provably guarantee that the partial results remain private. + +Let's take a simple example of an MPC protocol. One of the most basic MPC protocols is secure addition of two numbers. Each party splits their input into random shares that are secretly distributed. They exchange the shares and locally compute the sum of the shares which reconstructs the final sum without revealing the individual inputs. For example, if Alice has input x and Bob has input y: + +1. Alice generates random $x_1$ and sets $x_2 = x - x_1$ + +2. Bob generates random $y_1$ and sets $y_2 = y - y_1$ + +3. Alice sends $x_1$ to Bob, Bob sends $y_1$ to Alice (keeping $x_2$ and $y_2$ secret) + +4. Alice computes $x_2 + y_1 = s_1$, Bob computes $x_1 + y_2 = s_2$ + +5. $s_1 + s_2 = x + y$ is the final sum, without revealing $x$ or $y$. + +Alice's and Bob's individual inputs ($x$ and $y$) remain private, and each party only reveals one number associated with their original inputs. The random spits ensure no information about the original numbers disclosed + +**Secure Comparison:** Another basic operation is secure comparison of two numbers, determining which is greater than the other. This can be done using techniques like Yao's Garbled Circuits where the comparison circuit is encrypted to allow joint evaluation on the inputs without leaking them. + +**Secure Matrix Multiplication:** Matrix operations like multiplication are essential for machine learning. MPC techniques like additive secret sharing can be used to split matrices into random shares, compute products on the shares, then reconstruct the result. + +**Secure Model Training:** Distributed machine learning training algorithms like federated averaging can be made secure using MPC. Model updates computed on partitioned data at each node are secretly shared between nodes and aggregated to train the global model without exposing individual updates. + +The core idea behind MPC protocols is to divide the computation into steps that can be executed jointly without revealing intermediate sensitive data. This is accomplished by combining cryptographic techniques like secret sharing, homomorphic encryption, oblivious transfer, and garbled circuits. MPC protocols enable collaborative computation on sensitive data while providing provable privacy guarantees. This privacy-preserving capability is essential for many machine learning applications today involving multiple parties that cannot directly share their raw data. + +The main approaches used in MPC include: + +- **Homomorphic encryption:** Special encryption allows computations to be carried out on encrypted data without decrypting it. + +- **Secret sharing:** The private data is divided into random shares that are distributed to each party. Computations are done locally on the shares and finally reconstructed. + +- **Oblivious transfer:** A protocol where a receiver obtains a subset of data from a sender, but the sender does not know which specific data was transferred. + +- **Garbled circuits:** The function to be computed is represented as a Boolean circuit that is encrypted ("garbled") in a way that allows joint evaluation without revealing inputs. + +#### Trade-offs + +While MPC protocols provide strong privacy guarantees, they come at a high computational cost compared to plain computations. Every secure operation like addition, multiplication, comparison, etc requires orders of magnitude more processing than the equivalent unencrypted operation. This overhead stems from the underlying cryptographic techniques: + +- In partially homomorphic encryption, each computation on ciphertexts requires costly public-key operations. Fully homomorphic encryption has even higher overheads. + +- Secret sharing divides data into multiple shares, so even basic operations require manipulating many shares. + +- Oblivious transfer and garbled circuits add masking and encryption to hide data access patterns and execution flows. + +- MPC systems require extensive communication and interaction between parties to jointly compute on shares/ciphertexts. + +As a result, MPC protocols can slow down computations by 3-4 orders of magnitude compared to plain implementations. This becomes prohibitively expensive for large datasets and models. Therefore, training machine learning models on encrypted data using MPC remains infeasible today for realistic dataset sizes due to the overhead. Clever optimizations and approximations are needed to make MPC practical. + +Ongoing MPC research aims to close this efficiency gap through cryptographic advances, new algorithms, trusted hardware like SGX enclaves, and leveraging accelerators like GPUs/TPUs. But for the foreseeable future, some degree of approximation and performance tradeoff is likely needed to scale MPC to the demands of real-world machine learning systems. + +### Synthetic Data Generation + +#### Core Idea + +Synthetic data generation has emerged as an important privacy-preserving machine learning approach that allows models to be developed and tested without exposing real user data. The key idea is to train generative models on real-world datasets, then sample from these models to synthesize artificial data that statistically matches the original data distribution but does not contain actual user information. For example, a GAN could be trained on a dataset of sensitive medical records to learn the underlying patterns, then used to sample synthetic patient data. + +The primary challenge of synthesizing data is to ensure adversaries are unable to re-identify the original dataset. A simple approach to achieving synthetic data is to add noise to the original dataset, but this still risks privacy leakage. When noise is added to data in the context of differential privacy, it involves sophisticated mechanisms based on data's sensitivity to calibrate amount and distribution of noise. Through these mathematically rigorous frameworks, differential privacy generally guarantees privacy at some level which is the primary goal of this privacy-preserving technique. Beyond preserving privacy, synthetic data, however, combats multiple data availability issues such as imbalanced datasets, scarce datasets, and anomaly detection. + +Researchers can freely share this synthetic data and collaborate on modeling without revealing any private medical information. Well-constructed synthetic data protects privacy while providing utility for developing accurate models. Key techniques to prevent reconstruction of the original data include adding differential privacy noise during training, enforcing plausibility constraints, and using multiple diverse generative models. Here are some common approaches for generating synthetic data: + +- **Generative Adversarial Networks (GANs):** GANs (see Figure 15) are a type of AI algorithm used in unsupervised learning where two neural networks contest against each other in a game. The generator network is responsible for producing the synthetic data and the discriminator network evaluates the authenticity of the data by distinguishing between fake data created by the generator network and the real data. The discriminator acts as a metric on how similar the fake and real data are to one another. It is highly effective at generating realistic data and is, therefore, a popular approach for generating synthetic data. + +![Figure 15: Flowchart of GANs, demonstrating how a generator synthesizes fake data to send as an input to the discriminator, which distinguishes between the fake and real data in order to evaluate the authenticity of the data.](images/security_privacy/image9.png) + +- **Variational Autoencoders (VAEs):** VAEs are neural networks that are capable of learning complex probability distributions and balance between data generation quality and computational efficiency. They encode data into a latent space where they learn the distribution in order to decode the data back. + +- **Data Augmentation:** This involves applying transformations to existing data to create new, altered data. For example, flipping,rotating, and scaling (uniformly or non-uniformly) original images can help create a more diverse, robust image dataset before training an ML model. + +- **Simulations:** Mathematical models can simulate real-world systems or processes to mimic real-world phenomena. This is highly useful in scientific research, urban planning, and economics. + +#### Benefits + +While synthetic data may be necessary due to privacy or compliance risks, it is widely used in machine learning models when available data is of poor quality, scarce, or inaccessible. Synthetic data offers more efficient and effective development by streamlining robust model training, testing and deployment processes. It allows models to be shared more widely among researchers without breaching privacy laws and regulations. Collaboration between users of the same dataset will be facilitated which will help broaden the capabilities and advancements in ML research. + +There are several motivations for using synthetic data in machine learning: + +- **Privacy and compliance:** Synthetic data avoids exposing personal information, allowing more open sharing and collaboration. This is important when working with sensitive datasets like healthcare records or financial information. + +- **Data scarcity:** When insufficient real-world data is available,synthetic data can augment training datasets. This improves model accuracy when limited data is a bottleneck. + +- **Model testing:** Synthetic data provides privacy-safe sandboxes for testing model performance, debugging issues, and monitoring for bias. + +- **Data labeling:** High-quality labeled training data is often scarce and expensive. Synthetic data can help auto-generate labeled examples. + +#### Trade-offs + +While synthetic data aims to remove any evidence of the original dataset, privacy leakage is still a risk since the synthetic data is mimicking the original data. The statistical information and distribution is similar, if not the same, between the original and synthetic data. By resampling from the distribution, adversaries may still be able to recover the original training samples. Due to their inherent learning processes and complexities,neural networks might accidentally reveal sensitive information about the original training data. + +A core challenge with synthetic data is the potential gap between synthetic and real-world data distributions. Despite advancements in generative modeling techniques, synthetic data may not fully capture the complexity, diversity, and nuanced patterns of real data. This can limit the utility of synthetic data for robustly training machine learning models. Rigorously evaluating synthetic data quality through techniques like adversary methods and comparing model performance to real data benchmarks helps assess and improve fidelity. But inherently, synthetic data remains an approximation. + +Another critical concern is the privacy risks of synthetic data. Generative models may leak identifiable information about individuals in the training data that could enable reconstruction of private information. Emerging adversarial attacks demonstrate the challenges in preventing identity leakage from synthetic data generation pipelines. Techniques like differential privacy ca help safeguard privacy but come with tradeoffs in data utility. There is an inherent tension between producing useful synthetic data and fully protecting sensitive training data that must be balanced. + +Additional pitfalls of synthetic data include amplified biases, labeling difficulties, computational overhead of training generative models, storage costs, and failure to account for out-of-distribution novel data. While these are secondary to the core synthetic-real gap and privacy risks, they remain important considerations when evaluating the suitability of synthetic data for particular machine learning tasks. As with any technique, the advantages of synthetic data come with inherent tradeoffs and limitations that require thoughtful mitigation strategies. + +### Summary + +While all the techniques we have discussed thus far aim to enable privacy-preserving machine learning, they involve distinct mechanisms and tradeoffs. Factors like computational constraints, required trust assumptions, threat models, and data characteristics help guide the selection process for a particular use case. But finding the right balance between privacy, accuracy and efficiency necessitates experimentation and empirical evaluation for many applications. Below is a comparison table of the key privacy-preserving machine learning techniques and their pros and cons: + +| Technique | Pros | Cons | +|-|-|-| +| Differential Privacy | Strong formal privacy guarantees
Robust to auxiliary data attacks
Versatile for many data types and analyses | Accuracy loss from noise addition
Computational overhead for sensitivity analysis and noise generation | +| Federated Learning | Allows collaborative learning without sharing raw data
Data remains decentralized improving security
No need for encrypted computation | Increased communication overhead
Potentially slower model convergence
Uneven client device capabilities | +| Secure Multi-Party Computation | Enables joint computation on sensitive data
Provides cryptographic privacy guarantees
Flexible protocols for various functions | Very high computational overhead
Complexity of implementation
Algorithmic constraints on function depth | +| Homomorphic Encryption | Allows computation on encrypted data
Prevents intermediate state exposure | Extremely high computational cost
Complex cryptographic implementations
Restrictions on function types | +| Synthetic Data Generation | Enables data sharing without leakage
Mitigates data scarcity problems | Synthetic-real gap in distributions
Potential for reconstructing private data
Biases and labeling challenges | ## Conclusion -Explanation: This final section will encapsulate the key takeaways from the chapter, providing readers with a consolidated view of the critical aspects of privacy and security in embedded AI systems. It aims to reinforce the importance of implementing robust security measures to protect data and preserve user trust. +Machine learning hardware security is a critical concern as embedded ML systems are increasingly deployed in safety-critical domains like medical devices, industrial controls, and autonomous vehicles. We have explored various threats spanning hardware bugs, physical attacks, side channels, supply chain risks and more. Defenses like trusted execution environments, secure boot, PUFs, and hardware security modules provide multilayer protection tailored for resource-constrained embedded devices. + +However, continual vigilance is essential to track emerging attack vectors and address potential vulnerabilities through secure engineering practices across the hardware lifecycle. As ML and embedded ML spreads, maintaining rigorous security foundations that match the field's accelerating pace of innovation remains imperative. -- Recap of privacy and security principles -- Importance of an integrated approach to privacy and security -- Future directions and areas for further study -s \ No newline at end of file diff --git a/references.bib b/references.bib index 4b4cf6b6b..241b0a07c 100644 --- a/references.bib +++ b/references.bib @@ -1,2827 +1,3485 @@ -@article{10242251, - title = {Training Spiking Neural Networks Using Lessons From Deep Learning}, - author = {Eshraghian, Jason K. and Ward, Max and Neftci, Emre O. and Wang, Xinxin and Lenz, Gregor and Dwivedi, Girish and Bennamoun, Mohammed and Jeong, Doo Seok and Lu, Wei D.}, - year = 2023, - journal = {Proceedings of the IEEE}, - volume = 111, - number = 9, - pages = {1016--1054}, +@article{Ratner_Hancock_Dunnmon_Goldman_Ré_2018, + title = {Snorkel metal: Weak supervision for multi-task learning.}, + author = {Ratner, Alex and Hancock, Braden and Dunnmon, Jared and Goldman, Roger and R\'{e}, Christopher}, + year = 2018, + journal = {Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning}, } -@inproceedings{abadi2016deep, - title = {Deep learning with differential privacy}, - author = {Abadi, Martin and Chu, Andy and Goodfellow, Ian and McMahan, H Brendan and Mironov, Ilya and Talwar, Kunal and Zhang, Li}, - year = 2016, - booktitle = {Proceedings of the 2016 ACM SIGSAC conference on computer and communications security}, - pages = {308--318}, +@inproceedings{sculley2015hidden, + title = {"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI}, + author = {Nithya Sambasivan and Shivani Kapania and Hannah Highfill and Diana Akrong and Praveen Kumar Paritosh and Lora Mois Aroyo}, + year = 2021, } -@inproceedings{abadi2016tensorflow, - title = {$\{$TensorFlow\$\}\$: a system for \$\{\$Large-Scale\$\}\$ machine learning}, - author = {Abadi, Mart{\'\i}n and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others}, - year = 2016, - booktitle = {12th USENIX symposium on operating systems design and implementation (OSDI 16)}, - pages = {265--283}, +@inproceedings{kocher1996timing, + title={Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems}, + author={Kocher, Paul C}, + booktitle={Advances in Cryptology—CRYPTO’96: 16th Annual International Cryptology Conference Santa Barbara, California, USA August 18--22, 1996 Proceedings 16}, + pages={104--113}, + year={1996}, + organization={Springer} } -@inproceedings{adolf2016fathom, - title = {Fathom: Reference workloads for modern deep learning methods}, - author = {Adolf, Robert and Rama, Saketh and Reagen, Brandon and Wei, Gu-Yeon and Brooks, David}, - year = 2016, - booktitle = {2016 IEEE International Symposium on Workload Characterization (IISWC)}, - pages = {1--10}, - organization = {IEEE}, +@inproceedings{agrawal2003side, + title={The EM side—channel (s)}, + author={Agrawal, Dakshi and Archambeault, Bruce and Rao, Josyula R and Rohatgi, Pankaj}, + booktitle={Cryptographic Hardware and Embedded Systems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA, August 13--15, 2002 Revised Papers 4}, + pages={29--45}, + year={2003}, + organization={Springer} } -@article{afib, - title = {Mobile Photoplethysmographic Technology to Detect Atrial Fibrillation}, - author = {Yutao Guo and Hao Wang and Hui Zhang and Tong Liu and Zhaoguang Liang and Yunlong Xia and Li Yan and Yunli Xing and Haili Shi and Shuyan Li and Yanxia Liu and Fan Liu and Mei Feng and Yundai Chen and Gregory Y.H. Lip and null null}, - year = 2019, - journal = {Journal of the American College of Cardiology}, - volume = 74, - number = 19, - pages = {2365--2375}, +@article{breier2018deeplaser, + title={Deeplaser: Practical fault attack on deep neural networks}, + author={Breier, Jakub and Hou, Xiaolu and Jap, Dirmanto and Ma, Lei and Bhasin, Shivam and Liu, Yang}, + journal={arXiv preprint arXiv:1806.05859}, + year={2018} +} + + +@inproceedings{skorobogatov2003optical, + title={Optical fault induction attacks}, + author={Skorobogatov, Sergei P and Anderson, Ross J}, + booktitle={Cryptographic Hardware and Embedded Systems-CHES 2002: 4th International Workshop Redwood Shores, CA, USA, August 13--15, 2002 Revised Papers 4}, + pages={2--12}, + year={2003}, + organization={Springer} +} + +@inproceedings{skorobogatov2009local, + title={Local heating attacks on flash memory devices}, + author={Skorobogatov, Sergei}, + booktitle={2009 IEEE International Workshop on Hardware-Oriented Security and Trust}, + pages={1--6}, + year={2009}, + organization={IEEE} +} + + +@article{oprea2022poisoning, + title={Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?}, + author={Oprea, Alina and Singhal, Anoop and Vassilev, Apostol}, + journal={Computer}, + volume={55}, + number={11}, + pages={94--99}, + year={2022}, + publisher={IEEE} +} + +@inproceedings{antonakakis2017understanding, + title={Understanding the mirai botnet}, + author={Antonakakis, Manos and April, Tim and Bailey, Michael and Bernhard, Matt and Bursztein, Elie and Cochran, Jaime and Durumeric, Zakir and Halderman, J Alex and Invernizzi, Luca and Kallitsis, Michalis and others}, + booktitle={26th USENIX security symposium (USENIX Security 17)}, + pages={1093--1110}, + year={2017} } +@article{goodfellow2020generative, + title={Generative adversarial networks}, + author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua}, + journal={Communications of the ACM}, + volume={63}, + number={11}, + pages={139--144}, + year={2020}, + publisher={ACM New York, NY, USA} +} + + +@conference{Rombach22cvpr, +title = {High-Resolution Image Synthesis with Latent Diffusion Models}, +author = {Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer}, +url = {https://github.com/CompVis/latent-diffusionhttps://arxiv.org/abs/2112.10752}, +year = {2022}, +booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +} + + +@inproceedings{ramesh2021zero, + title={Zero-shot text-to-image generation}, + author={Ramesh, Aditya and Pavlov, Mikhail and Goh, Gabriel and Gray, Scott and Voss, Chelsea and Radford, Alec and Chen, Mark and Sutskever, Ilya}, + booktitle={International Conference on Machine Learning}, + pages={8821--8831}, + year={2021}, + organization={PMLR} +} + +@article{shan2023prompt, + title={Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models}, + author={Shan, Shawn and Ding, Wenxin and Passananti, Josephine and Zheng, Haitao and Zhao, Ben Y}, + journal={arXiv preprint arXiv:2310.13828}, + year={2023} +} + +@article{soufleri2023synthetic, + author = {Efstathia Soufleri and Gobinda Saha and Kaushik Roy}, + date-added = {2023-11-22 19:26:18 -0500}, + date-modified = {2023-11-22 19:26:57 -0500}, + journal = {arXiv preprint arXiv:2210.03205}, + title = {Synthetic Dataset Generation for Privacy-Preserving Machine Learning}, + year = {2023}} + +@article{eldan2023whos, + author = {Ronen Eldan and Mark Russinovich}, + date-added = {2023-11-22 19:24:35 -0500}, + date-modified = {2023-11-22 19:25:20 -0500}, + journal = {arXiv preprint arXiv:2310.02238}, + title = {Who's Harry Potter? Approximate Unlearning in LLMs}, + year = {2023}} + +@article{khan2021knowledgeadaptation, + author = {Mohammad Emtiyaz Khan and Siddharth Swaroop}, + date-added = {2023-11-22 19:22:50 -0500}, + date-modified = {2023-11-22 19:23:40 -0500}, + journal = {arXiv preprint arXiv:2106.08769}, + title = {Knowledge-Adaptation Priors}, + year = {2021}} + +@article{tarun2023deep, + author = {Ayush K Tarun and Vikram S Chundawat and Murari Mandal and Mohan Kankanhalli}, + date-added = {2023-11-22 19:20:59 -0500}, + date-modified = {2023-11-22 19:21:59 -0500}, + journal = {arXiv preprint arXiv:2210.08196}, + title = {Deep Regression Unlearning}, + year = {2023}} + +@article{Li2020Federated, + author = {Li, Tian and Sahu, Anit Kumar and Talwalkar, Ameet and Smith, Virginia}, + date-added = {2023-11-22 19:15:13 -0500}, + date-modified = {2023-11-22 19:17:19 -0500}, + journal = {IEEE Signal Processing Magazine}, + number = {3}, + pages = {50-60}, + title = {Federated Learning: Challenges, Methods, and Future Directions}, + volume = {37}, + year = {2020}} + +@article{MAL-083, + author = {Peter Kairouz and H. Brendan McMahan and Brendan Avent and Aur{\'e}lien Bellet and Mehdi Bennis and Arjun Nitin Bhagoji and Kallista Bonawitz and Zachary Charles and Graham Cormode and Rachel Cummings and Rafael G. L. D'Oliveira and Hubert Eichner and Salim El Rouayheb and David Evans and Josh Gardner and Zachary Garrett and Adri{\`a} Gasc{\'o}n and Badih Ghazi and Phillip B. Gibbons and Marco Gruteser and Zaid Harchaoui and Chaoyang He and Lie He and Zhouyuan Huo and Ben Hutchinson and Justin Hsu and Martin Jaggi and Tara Javidi and Gauri Joshi and Mikhail Khodak and Jakub Konecn{\'y} and Aleksandra Korolova and Farinaz Koushanfar and Sanmi Koyejo and Tancr{\`e}de Lepoint and Yang Liu and Prateek Mittal and Mehryar Mohri and Richard Nock and Ayfer {\"O}zg{\"u}r and Rasmus Pagh and Hang Qi and Daniel Ramage and Ramesh Raskar and Mariana Raykova and Dawn Song and Weikang Song and Sebastian U. Stich and Ziteng Sun and Ananda Theertha Suresh and Florian Tram{\`e}r and Praneeth Vepakomma and Jianyu Wang and Li Xiong and Zheng Xu and Qiang Yang and Felix X. Yu and Han Yu and Sen Zhao}, + date-added = {2023-11-22 19:14:08 -0500}, + date-modified = {2023-11-22 19:14:08 -0500}, + doi = {10.1561/2200000083}, + issn = {1935-8237}, + journal = {Foundations and Trends{\textregistered} in Machine Learning}, + number = {1--2}, + pages = {1-210}, + title = {Advances and Open Problems in Federated Learning}, + url = {http://dx.doi.org/10.1561/2200000083}, + volume = {14}, + year = {2021}, + Bdsk-Url-1 = {http://dx.doi.org/10.1561/2200000083}} + +@inproceedings{abadi2016deep, + address = {New York, NY, USA}, + author = {Abadi, Martin and Chu, Andy and Goodfellow, Ian and McMahan, H. Brendan and Mironov, Ilya and Talwar, Kunal and Zhang, Li}, + booktitle = {Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security}, + date-added = {2023-11-22 18:06:03 -0500}, + date-modified = {2023-11-22 18:08:42 -0500}, + keywords = {deep learning, differential privacy}, + pages = {308--318}, + publisher = {Association for Computing Machinery}, + series = {CCS '16}, + title = {Deep Learning with Differential Privacy}, + year = {2016}} + +@inproceedings{Dwork2006Theory, + address = {Berlin, Heidelberg}, + author = {Dwork, Cynthia and McSherry, Frank and Nissim, Kobbi and Smith, Adam}, + booktitle = {Theory of Cryptography}, + date-added = {2023-11-22 18:04:12 -0500}, + date-modified = {2023-11-22 18:05:20 -0500}, + editor = {Halevi, Shai and Rabin, Tal}, + pages = {265-284}, + publisher = {Springer Berlin Heidelberg}, + title = {Calibrating Noise to Sensitivity in Private Data Analysis}, + year = {2006}} + +@article{Gupta2023ChatGPT, + author = {Gupta, Maanak and Akiri, Charankumar and Aryal, Kshitiz and Parker, Eli and Praharaj, Lopamudra}, + date-added = {2023-11-22 18:01:41 -0500}, + date-modified = {2023-11-22 18:02:55 -0500}, + journal = {IEEE Access}, + pages = {80218-80245}, + title = {From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy}, + volume = {11}, + year = {2023}} + +@inproceedings{Biega2020Oper, + address = {New York, NY, USA}, + author = {Biega, Asia J. and Potash, Peter and Daum\'{e}, Hal and Diaz, Fernando and Finck, Mich\`{e}le}, + booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, + date-added = {2023-11-22 17:57:23 -0500}, + date-modified = {2023-11-22 17:59:54 -0500}, + keywords = {data minimization, privacy, gdpr, recommender systems, purpose limitation, personalization}, + pages = {399--408}, + publisher = {Association for Computing Machinery}, + series = {SIGIR '20}, + title = {Operationalizing the Legal Principle of Data Minimization for Personalization}, + year = {2020}} + +@article{cavoukian2009privacy, + author = {Cavoukian, Ann}, + date-added = {2023-11-22 17:55:45 -0500}, + date-modified = {2023-11-22 17:56:58 -0500}, + journal = {Office of the Information and Privacy Commissioner}, + title = {Privacy by design}, + year = {2009}} + +@article{Gao2020Physical, + author = {Gao, Yansong and Al-Sarawi, Said F. and Abbott, Derek}, + date-added = {2023-11-22 17:52:20 -0500}, + date-modified = {2023-11-22 17:54:56 -0500}, + journal = {Nature Electronics}, + month = {February}, + number = {2}, + pages = {81-91}, + title = {Physical unclonable functions}, + volume = {3}, + year = {2020}} + +@inproceedings{Rashmi2018Secure, + author = {R.V. Rashmi and A. Karthikeyan}, + booktitle = {2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)}, + date-added = {2023-11-22 17:50:16 -0500}, + date-modified = {2023-11-22 17:51:39 -0500}, + pages = {291-298}, + title = {Secure boot of Embedded Applications - A Review}, + year = {2018}} + +@article{miller2015remote, + author = {Miller, Charlie and Valasek, Chris}, + date-added = {2023-11-22 17:11:27 -0500}, + date-modified = {2023-11-22 17:12:18 -0500}, + journal = {Black Hat USA}, + number = {S 91}, + pages = {1-91}, + title = {Remote exploitation of an unaltered passenger vehicle}, + volume = {2015}, + year = {2015}} + +@book{dhanjani2015abusing, + author = {Dhanjani, Nitesh}, + date-added = {2023-11-22 17:09:41 -0500}, + date-modified = {2023-11-22 17:10:22 -0500}, + publisher = {O'Reilly Media, Inc.}, + title = {Abusing the internet of things: blackouts, freakouts, and stakeouts}, + year = {2015}} + +@inproceedings{zhao2018fpga, + author = {Zhao, Mark and Suh, G Edward}, + booktitle = {2018 IEEE Symposium on Security and Privacy (SP)}, + date-added = {2023-11-22 17:08:21 -0500}, + date-modified = {2023-11-22 17:09:07 -0500}, + organization = {IEEE}, + pages = {229-244}, + title = {FPGA-based remote power side-channel attacks}, + year = {2018}} + +@inproceedings{gnad2017voltage, + author = {Gnad, Dennis RE and Oboril, Fabian and Tahoori, Mehdi B}, + booktitle = {2017 27th International Conference on Field Programmable Logic and Applications (FPL)}, + date-added = {2023-11-22 17:07:13 -0500}, + date-modified = {2023-11-22 17:07:59 -0500}, + organization = {IEEE}, + pages = {1-7}, + title = {Voltage drop-based fault attacks on FPGAs using valid bitstreams}, + year = {2017}} + +@inproceedings{Asonov2004Keyboard, + author = {Asonov, D. and Agrawal, R.}, + booktitle = {IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004}, + date-added = {2023-11-22 17:05:39 -0500}, + date-modified = {2023-11-22 17:06:45 -0500}, + organization = {IEEE}, + pages = {3-11}, + title = {Keyboard acoustic emanations}, + year = {2004}} + +@article{Burnet1989Spycatcher, + author = {David Burnet and Richard Thomas}, + date-added = {2023-11-22 17:03:00 -0500}, + date-modified = {2023-11-22 17:04:44 -0500}, + journal = {Journal of Law and Society}, + number = {2}, + pages = {210-224}, + title = {Spycatcher: The Commodification of Truth}, + volume = {16}, + year = {1989}} + +@article{Kocher2011Intro, + author = {Kocher, Paul and Jaffe, Joshua and Jun, Benjamin and Rohatgi, Pankaj}, + date-added = {2023-11-22 16:58:42 -0500}, + date-modified = {2023-11-22 17:00:36 -0500}, + journal = {Journal of Cryptographic Engineering}, + month = {April}, + number = {1}, + pages = {5-27}, + title = {Introduction to differential power analysis}, + volume = {1}, + year = {2011}} + +@inproceedings{gandolfi2001electromagnetic, + author = {Gandolfi, Karine and Mourtel, Christophe and Olivier, Francis}, + booktitle = {Cryptographic Hardware and Embedded Systems---CHES 2001: Third International Workshop Paris, France, May 14--16, 2001 Proceedings 3}, + date-added = {2023-11-22 16:56:42 -0500}, + date-modified = {2023-11-22 16:57:40 -0500}, + organization = {Springer}, + pages = {251-261}, + title = {Electromagnetic analysis: Concrete results}, + year = {2001}} + +@inproceedings{kocher1999differential, + author = {Kocher, Paul and Jaffe, Joshua and Jun, Benjamin}, + booktitle = {Advances in Cryptology---CRYPTO'99: 19th Annual International Cryptology Conference Santa Barbara, California, USA, August 15--19, 1999 Proceedings 19}, + date-added = {2023-11-22 16:55:28 -0500}, + date-modified = {2023-11-22 16:56:18 -0500}, + organization = {Springer}, + pages = {388-397}, + title = {Differential power analysis}, + year = {1999}} + +@inproceedings{hsiao2023mavfi, + author = {Hsiao, Yu-Shun and Wan, Zishen and Jia, Tianyu and Ghosal, Radhika and Mahmoud, Abdulrahman and Raychowdhury, Arijit and Brooks, David and Wei, Gu-Yeon and Reddi, Vijay Janapa}, + booktitle = {2023 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, + date-added = {2023-11-22 16:54:11 -0500}, + date-modified = {2023-11-22 16:55:12 -0500}, + organization = {IEEE}, + pages = {1-6}, + title = {Mavfi: An end-to-end fault analysis framework with anomaly detection and recovery for micro aerial vehicles}, + year = {2023}} + +@inproceedings{Breier2018Practical, + address = {New York, NY, USA}, + author = {Breier, Jakub and Hou, Xiaolu and Jap, Dirmanto and Ma, Lei and Bhasin, Shivam and Liu, Yang}, + booktitle = {Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security}, + date-added = {2023-11-22 16:51:23 -0500}, + date-modified = {2023-11-22 16:53:46 -0500}, + keywords = {fault attacks, deep learning security, adversarial attacks}, + pages = {2204--2206}, + publisher = {Association for Computing Machinery}, + series = {CCS '18}, + title = {Practical Fault Attack on Deep Neural Networks}} + +@inproceedings{govindavajhala2003using, + author = {Govindavajhala, Sudhakar and Appel, Andrew W}, + booktitle = {2003 Symposium on Security and Privacy, 2003.}, + date-added = {2023-11-22 16:46:13 -0500}, + date-modified = {2023-11-22 16:47:03 -0500}, + organization = {IEEE}, + pages = {154-156}, + title = {Using memory errors to attack a virtual machine}, + year = {2003}} + +@inproceedings{amiel2006fault, + author = {Amiel, Frederic and Clavier, Christophe and Tunstall, Michael}, + booktitle = {International Workshop on Fault Diagnosis and Tolerance in Cryptography}, + date-added = {2023-11-22 16:45:05 -0500}, + date-modified = {2023-11-22 16:45:55 -0500}, + organization = {Springer}, + pages = {223-236}, + title = {Fault analysis of DPA-resistant algorithms}, + year = {2006}} + +@inproceedings{hutter2009contact, + author = {Hutter, Michael and Schmidt, Jorn-Marc and Plos, Thomas}, + booktitle = {2009 European Conference on Circuit Theory and Design}, + date-added = {2023-11-22 16:43:29 -0500}, + date-modified = {2023-11-22 16:44:30 -0500}, + organization = {IEEE}, + pages = {409-412}, + title = {Contact-based fault injections and power analysis on RFID tags}, + year = {2009}} + +@inproceedings{barenghi2010low, + author = {Barenghi, Alessandro and Bertoni, Guido M and Breveglieri, Luca and Pellicioli, Mauro and Pelosi, Gerardo}, + booktitle = {2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST)}, + date-added = {2023-11-22 16:42:05 -0500}, + date-modified = {2023-11-22 16:43:09 -0500}, + organization = {IEEE}, + pages = {7-12}, + title = {Low voltage fault attacks to AES}, + year = {2010}} + +@book{joye2012fault, + author = {Joye, Marc and Tunstall, Michael}, + date-added = {2023-11-22 16:35:24 -0500}, + date-modified = {2023-11-22 16:36:20 -0500}, + publisher = {Springer Publishing Company, Incorporated}, + title = {Fault Analysis in Cryptography}, + year = {2012}} + +@inproceedings{Kocher2018spectre, + author = {Paul Kocher and Jann Horn and Anders Fogh and and Daniel Genkin and Daniel Gruss and Werner Haas and Mike Hamburg and Moritz Lipp and Stefan Mangard and Thomas Prescher and Michael Schwarz and Yuval Yarom}, + booktitle = {40th IEEE Symposium on Security and Privacy (S\&P'19)}, + date-added = {2023-11-22 16:33:35 -0500}, + date-modified = {2023-11-22 16:34:01 -0500}, + title = {Spectre Attacks: Exploiting Speculative Execution}, + year = {2019}} + +@inproceedings{Lipp2018meltdown, + author = {Moritz Lipp and Michael Schwarz and Daniel Gruss and Thomas Prescher and Werner Haas and Anders Fogh and Jann Horn and Stefan Mangard and Paul Kocher and Daniel Genkin and Yuval Yarom and Mike Hamburg}, + booktitle = {27th {USENIX} Security Symposium ({USENIX} Security 18)}, + date-added = {2023-11-22 16:32:26 -0500}, + date-modified = {2023-11-22 16:33:08 -0500}, + title = {Meltdown: Reading Kernel Memory from User Space}, + year = {2018}} + +@article{eykholt2018robust, + author = {Kevin Eykholt and Ivan Evtimov and Earlence Fernandes and Bo Li and Amir Rahmati and Chaowei Xiao and Atul Prakash and Tadayoshi Kohno and Dawn Song}, + date-added = {2023-11-22 16:30:51 -0500}, + date-modified = {2023-11-22 16:31:55 -0500}, + journal = {arXiv preprint arXiv:1707.08945}, + title = {Robust Physical-World Attacks on Deep Learning Models}, + year = {2018}} + +@inproceedings{Abdelkader_2020, + author = {Abdelkader, Ahmed and Curry, Michael J. and Fowl, Liam and Goldstein, Tom and Schwarzschild, Avi and Shu, Manli and Studer, Christoph and Zhu, Chen}, + booktitle = {ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + date-added = {2023-11-22 16:28:31 -0500}, + date-modified = {2023-11-22 16:29:33 -0500}, + title = {Headless Horseman: Adversarial Attacks on Transfer Learning Models}, + year = {2020}} + +@article{parrish2023adversarial, + author = {Alicia Parrish and Hannah Rose Kirk and Jessica Quaye and Charvi Rastogi and Max Bartolo and Oana Inel and Juan Ciro and Rafael Mosquera and Addison Howard and Will Cukierski and D. Sculley and Vijay Janapa Reddi and Lora Aroyo}, + date-added = {2023-11-22 16:24:50 -0500}, + date-modified = {2023-11-22 16:26:30 -0500}, + journal = {arXiv preprint arXiv:2305.14384}, + title = {Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models}, + year = {2023}} + +@article{hosseini2017deceiving, + author = {Hosseini, Hossein and Kannan, Sreeram and Zhang, Baosen and Poovendran, Radha}, + date-added = {2023-11-22 16:22:18 -0500}, + date-modified = {2023-11-22 16:23:43 -0500}, + journal = {arXiv preprint arXiv:1702.08138}, + title = {Deceiving google's perspective api built for detecting toxic comments}, + year = {2017}} + +@article{biggio2012poisoning, + author = {Biggio, Battista and Nelson, Blaine and Laskov, Pavel}, + date-added = {2023-11-22 16:21:35 -0500}, + date-modified = {2023-11-22 16:22:06 -0500}, + journal = {arXiv preprint arXiv:1206.6389}, + title = {Poisoning attacks against support vector machines}, + year = {2012}} + +@article{oliynyk2023know, + author = {Oliynyk, Daryna and Mayer, Rudolf and Rauber, Andreas}, + date-added = {2023-11-22 16:18:21 -0500}, + date-modified = {2023-11-22 16:20:44 -0500}, + journal = {ACM Comput. Surv.}, + keywords = {model stealing, Machine learning, model extraction}, + month = {July}, + number = {14s}, + title = {I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences}, + volume = {55}, + year = {2023}} + +@article{narayanan2006break, + author = {Narayanan, Arvind and Shmatikov, Vitaly}, + date-added = {2023-11-22 16:16:19 -0500}, + date-modified = {2023-11-22 16:16:59 -0500}, + journal = {arXiv preprint cs/0610105}, + title = {How to break anonymity of the netflix prize dataset}, + year = {2006}} + +@article{ateniese2015hacking, + author = {Ateniese, Giuseppe and Mancini, Luigi V and Spognardi, Angelo and Villani, Antonio and Vitali, Domenico and Felici, Giovanni}, + date-added = {2023-11-22 16:14:42 -0500}, + date-modified = {2023-11-22 16:15:42 -0500}, + journal = {International Journal of Security and Networks}, + number = {3}, + pages = {137-150}, + title = {Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers}, + volume = {10}, + year = {2015}} + +@article{miller2019lessons, + author = {Miller, Charlie}, + date-added = {2023-11-22 16:12:04 -0500}, + date-modified = {2023-11-22 16:13:31 -0500}, + journal = {IEEE Design & Test}, + number = {6}, + pages = {7-9}, + title = {Lessons learned from hacking a car}, + volume = {36}, + year = {2019}} + +@article{farwell2011stuxnet, + author = {Farwell, James P and Rohozinski, Rafal}, + date-added = {2023-11-22 14:03:31 -0500}, + date-modified = {2023-11-22 14:05:19 -0500}, + journal = {Survival}, + number = {1}, + pages = {23-40}, + title = {Stuxnet and the future of cyber war}, + volume = {53}, + year = {2011}} + +@inproceedings{krishnan2023archgym, + author = {Krishnan, Srivatsan and Yazdanbakhsh, Amir and Prakash, Shvetank and Jabbour, Jason and Uchendu, Ikechukwu and Ghosh, Susobhan and Boroujerdian, Behzad and Richins, Daniel and Tripathy, Devashree and Faust, Aleksandra and Janapa Reddi, Vijay}, + booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, + pages = {1--16}, + title = {ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design}, + year = {2023}} + +@misc{kuzmin2022fp8, + archiveprefix = {arXiv}, + author = {Andrey Kuzmin and Mart Van Baalen and Yuwei Ren and Markus Nagel and Jorn Peters and Tijmen Blankevoort}, + eprint = {2208.09225}, + primaryclass = {cs.LG}, + title = {FP8 Quantization: The Power of the Exponent}, + year = {2022}} + +@inproceedings{abadi2016tensorflow, + author = {Abadi, Mart{\'\i}n and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others}, + booktitle = {12th USENIX symposium on operating systems design and implementation (OSDI 16)}, + pages = {265--283}, + title = {$\{$TensorFlow$\}$: a system for $\{$Large-Scale$\}$ machine learning}, + year = 2016} + +@article{shastri2021photonics, + author = {Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, + journal = {Nature Photonics}, + number = {2}, + pages = {102--114}, + publisher = {Nature Publishing Group UK London}, + title = {Photonics for artificial intelligence and neuromorphic computing}, + volume = {15}, + year = {2021}} + +@inproceedings{jouppi2017datacenter, + author = {Jouppi, Norman P and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and others}, + booktitle = {Proceedings of the 44th annual international symposium on computer architecture}, + pages = {1--12}, + title = {In-datacenter performance analysis of a tensor processing unit}, + year = {2017}} + +@inproceedings{ignatov2018ai, + author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, + booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, + pages = {0--0}, + title = {Ai benchmark: Running deep neural networks on android smartphones}, + year = {2018}} + +@inproceedings{adolf2016fathom, + author = {Adolf, Robert and Rama, Saketh and Reagen, Brandon and Wei, Gu-Yeon and Brooks, David}, + booktitle = {2016 IEEE International Symposium on Workload Characterization (IISWC)}, + organization = {IEEE}, + pages = {1--10}, + title = {Fathom: Reference workloads for modern deep learning methods}, + year = 2016} + @misc{al2016theano, - title = {Theano: A Python framework for fast computation of mathematical expressions}, - author = {The Theano Development Team and Rami Al-Rfou and Guillaume Alain and Amjad Almahairi and Christof Angermueller and Dzmitry Bahdanau and Nicolas Ballas and Fr\'{e}d\'{e}ric Bastien and Justin Bayer and Anatoly Belikov and Alexander Belopolsky and Yoshua Bengio and Arnaud Bergeron and James Bergstra and Valentin Bisson and Josh Bleecher Snyder and Nicolas Bouchard and Nicolas Boulanger-Lewandowski and Xavier Bouthillier and Alexandre de Br\'{e}bisson and Olivier Breuleux and Pierre-Luc Carrier and Kyunghyun Cho and Jan Chorowski and Paul Christiano and Tim Cooijmans and Marc-Alexandre C\^{o}t\'{e} and Myriam C\^{o}t\'{e} and Aaron Courville and Yann N. Dauphin and Olivier Delalleau and Julien Demouth and Guillaume Desjardins and Sander Dieleman and Laurent Dinh and M\'{e}lanie Ducoffe and Vincent Dumoulin and Samira Ebrahimi Kahou and Dumitru Erhan and Ziye Fan and Orhan Firat and Mathieu Germain and Xavier Glorot and Ian Goodfellow and Matt Graham and Caglar Gulcehre and Philippe Hamel and Iban Harlouchet and Jean-Philippe Heng and Bal\'{a}zs Hidasi and Sina Honari and Arjun Jain and S\'{e}bastien Jean and Kai Jia and Mikhail Korobov and Vivek Kulkarni and Alex Lamb and Pascal Lamblin and Eric Larsen and C\'{e}sar Laurent and Sean Lee and Simon Lefrancois and Simon Lemieux and Nicholas L\'{e}onard and Zhouhan Lin and Jesse A. Livezey and Cory Lorenz and Jeremiah Lowin and Qianli Ma and Pierre-Antoine Manzagol and Olivier Mastropietro and Robert T. McGibbon and Roland Memisevic and Bart van Merri\"{e}nboer and Vincent Michalski and Mehdi Mirza and Alberto Orlandi and Christopher Pal and Razvan Pascanu and Mohammad Pezeshki and Colin Raffel and Daniel Renshaw and Matthew Rocklin and Adriana Romero and Markus Roth and Peter Sadowski and John Salvatier and Fran\c{c}ois Savard and Jan Schl\"{u}ter and John Schulman and Gabriel Schwartz and Iulian Vlad Serban and Dmitriy Serdyuk and Samira Shabanian and \'{E}tienne Simon and Sigurd Spieckermann and S. Ramana Subramanyam and Jakub Sygnowski and J\'{e}r\'{e}mie Tanguay and Gijs van Tulder and Joseph Turian and Sebastian Urban and Pascal Vincent and Francesco Visin and Harm de Vries and David Warde-Farley and Dustin J. Webb and Matthew Willson and Kelvin Xu and Lijun Xue and Li Yao and Saizheng Zhang and Ying Zhang}, - year = 2016, - eprint = {1605.02688}, archiveprefix = {arXiv}, + author = {The Theano Development Team and Rami Al-Rfou and Guillaume Alain and Amjad Almahairi and Christof Angermueller and Dzmitry Bahdanau and Nicolas Ballas and Fr{\'e}d{\'e}ric Bastien and Justin Bayer and Anatoly Belikov and Alexander Belopolsky and Yoshua Bengio and Arnaud Bergeron and James Bergstra and Valentin Bisson and Josh Bleecher Snyder and Nicolas Bouchard and Nicolas Boulanger-Lewandowski and Xavier Bouthillier and Alexandre de Br{\'e}bisson and Olivier Breuleux and Pierre-Luc Carrier and Kyunghyun Cho and Jan Chorowski and Paul Christiano and Tim Cooijmans and Marc-Alexandre C{\^o}t{\'e} and Myriam C{\^o}t{\'e} and Aaron Courville and Yann N. Dauphin and Olivier Delalleau and Julien Demouth and Guillaume Desjardins and Sander Dieleman and Laurent Dinh and M{\'e}lanie Ducoffe and Vincent Dumoulin and Samira Ebrahimi Kahou and Dumitru Erhan and Ziye Fan and Orhan Firat and Mathieu Germain and Xavier Glorot and Ian Goodfellow and Matt Graham and Caglar Gulcehre and Philippe Hamel and Iban Harlouchet and Jean-Philippe Heng and Bal{\'a}zs Hidasi and Sina Honari and Arjun Jain and S{\'e}bastien Jean and Kai Jia and Mikhail Korobov and Vivek Kulkarni and Alex Lamb and Pascal Lamblin and Eric Larsen and C{\'e}sar Laurent and Sean Lee and Simon Lefrancois and Simon Lemieux and Nicholas L{\'e}onard and Zhouhan Lin and Jesse A. Livezey and Cory Lorenz and Jeremiah Lowin and Qianli Ma and Pierre-Antoine Manzagol and Olivier Mastropietro and Robert T. McGibbon and Roland Memisevic and Bart van Merri{\"e}nboer and Vincent Michalski and Mehdi Mirza and Alberto Orlandi and Christopher Pal and Razvan Pascanu and Mohammad Pezeshki and Colin Raffel and Daniel Renshaw and Matthew Rocklin and Adriana Romero and Markus Roth and Peter Sadowski and John Salvatier and Fran{\c c}ois Savard and Jan Schl{\"u}ter and John Schulman and Gabriel Schwartz and Iulian Vlad Serban and Dmitriy Serdyuk and Samira Shabanian and {\'E}tienne Simon and Sigurd Spieckermann and S. Ramana Subramanyam and Jakub Sygnowski and J{\'e}r{\'e}mie Tanguay and Gijs van Tulder and Joseph Turian and Sebastian Urban and Pascal Vincent and Francesco Visin and Harm de Vries and David Warde-Farley and Dustin J. Webb and Matthew Willson and Kelvin Xu and Lijun Xue and Li Yao and Saizheng Zhang and Ying Zhang}, + eprint = {1605.02688}, primaryclass = {cs.SC}, -} + title = {Theano: A Python framework for fast computation of mathematical expressions}, + year = 2016} @article{Aledhari_Razzak_Parizi_Saeed_2020, - title = {Federated learning: A survey on enabling technologies, Protocols, and applications}, - author = {Aledhari, Mohammed and Razzak, Rehma and Parizi, Reza M. and Saeed, Fahad}, - year = 2020, - journal = {IEEE Access}, - volume = 8, - pages = {140699–140725}, -} + author = {Aledhari, Mohammed and Razzak, Rehma and Parizi, Reza M. and Saeed, Fahad}, + doi = {10.1109/access.2020.3013541}, + journal = {IEEE Access}, + pages = {140699--140725}, + title = {Federated learning: A survey on enabling technologies, Protocols, and applications}, + volume = 8, + year = 2020, + Bdsk-Url-1 = {https://doi.org/10.1109/access.2020.3013541}} @article{aljundi_gradient_nodate, - title = {Gradient based sample selection for online continual learning}, - author = {Aljundi, Rahaf and Lin, Min and Goujaud, Baptiste and Bengio, Yoshua}, - language = {en}, -} + author = {Aljundi, Rahaf and Lin, Min and Goujaud, Baptiste and Bengio, Yoshua}, + file = {Aljundi et al. - Gradient based sample selection for online continu.pdf:/Users/alex/Zotero/storage/GPHM4KY7/Aljundi et al. - Gradient based sample selection for online continu.pdf:application/pdf}, + language = {en}, + title = {Gradient based sample selection for online continual learning}} @inproceedings{altayeb2022classifying, - title = {Classifying mosquito wingbeat sound using TinyML}, - author = {Altayeb, Moez and Zennaro, Marco and Rovai, Marcelo}, - year = 2022, - booktitle = {Proceedings of the 2022 ACM Conference on Information Technology for Social Good}, - pages = {132--137}, -} + author = {Altayeb, Moez and Zennaro, Marco and Rovai, Marcelo}, + booktitle = {Proceedings of the 2022 ACM Conference on Information Technology for Social Good}, + pages = {132--137}, + title = {Classifying mosquito wingbeat sound using TinyML}, + year = 2022} @misc{amodei_ai_2018, - title = {{AI} and {Compute}}, - author = {Amodei, Dario and Hernandez, Danny}, - year = 2018, - month = may, - journal = {OpenAI Blog}, - url = {https://openai.com/research/ai-and-compute}, -} + author = {Amodei, Dario and Hernandez, Danny}, + journal = {OpenAI Blog}, + month = may, + title = {{AI} and {Compute}}, + url = {https://openai.com/research/ai-and-compute}, + year = 2018, + Bdsk-Url-1 = {https://openai.com/research/ai-and-compute}} @inproceedings{antol2015vqa, - title = {Vqa: Visual question answering}, - author = {Antol, Stanislaw and Agrawal, Aishwarya and Lu, Jiasen and Mitchell, Margaret and Batra, Dhruv and Zitnick, C Lawrence and Parikh, Devi}, - year = 2015, - booktitle = {Proceedings of the IEEE international conference on computer vision}, - pages = {2425--2433}, -} + author = {Antol, Stanislaw and Agrawal, Aishwarya and Lu, Jiasen and Mitchell, Margaret and Batra, Dhruv and Zitnick, C Lawrence and Parikh, Devi}, + booktitle = {Proceedings of the IEEE international conference on computer vision}, + pages = {2425--2433}, + title = {Vqa: Visual question answering}, + year = 2015} @article{app112211073, - title = {Hardware/Software Co-Design for TinyML Voice-Recognition Application on Resource Frugal Edge Devices}, - author = {Kwon, Jisu and Park, Daejin}, - year = 2021, - journal = {Applied Sciences}, - volume = 11, - number = 22, - url = {https://www.mdpi.com/2076-3417/11/22/11073}, article-number = 11073, -} + author = {Kwon, Jisu and Park, Daejin}, + doi = {10.3390/app112211073}, + issn = {2076-3417}, + journal = {Applied Sciences}, + number = 22, + title = {Hardware/Software Co-Design for TinyML Voice-Recognition Application on Resource Frugal Edge Devices}, + url = {https://www.mdpi.com/2076-3417/11/22/11073}, + volume = 11, + year = 2021, + Bdsk-Url-1 = {https://www.mdpi.com/2076-3417/11/22/11073}, + Bdsk-Url-2 = {https://doi.org/10.3390/app112211073}} @article{Ardila_Branson_Davis_Henretty_Kohler_Meyer_Morais_Saunders_Tyers_Weber_2020, - title = {Common Voice: A Massively-Multilingual Speech Corpus}, - author = {Ardila, Rosana and Branson, Megan and Davis, Kelly and Henretty, Michael and Kohler, Michael and Meyer, Josh and Morais, Reuben and Saunders, Lindsay and Tyers, Francis M. and Weber, Gregor}, - year = 2020, - month = may, - journal = {Proceedings of the 12th Conference on Language Resources and Evaluation}, - pages = {4218--4222}, -} + author = {Ardila, Rosana and Branson, Megan and Davis, Kelly and Henretty, Michael and Kohler, Michael and Meyer, Josh and Morais, Reuben and Saunders, Lindsay and Tyers, Francis M. and Weber, Gregor}, + journal = {Proceedings of the 12th Conference on Language Resources and Evaluation}, + month = {May}, + pages = {4218-4222}, + title = {Common Voice: A Massively-Multilingual Speech Corpus}, + year = 2020} @misc{awq, - title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, - author = {Lin and Tang, Tang and Yang, Dang and Gan, Han}, - year = 2023, - url = {https://arxiv.org/abs/2306.00978}, - urldate = {2023-10-03}, -} - -@misc{bailey_enabling_2018, - title = {Enabling {Cheaper} {Design}}, - author = {Bailey, Brian}, - year = 2018, - month = sep, - journal = {Semiconductor Engineering}, - url = {https://semiengineering.com/enabling-cheaper-design/}, - urldate = {2023-11-07}, - language = {en-US}, -} - -@article{bains2020business, - title = {The business of building brains}, - author = {Bains, Sunny}, - year = 2020, - journal = {Nat. Electron}, - volume = 3, - number = 7, - pages = {348--351}, -} + author = {Lin and Tang, Tang and Yang, Dang and Gan, Han}, + doi = {10.48550/arXiv.2306.00978}, + title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, + url = {https://arxiv.org/abs/2306.00978}, + urldate = {2023-10-03}, + year = 2023, + Bdsk-Url-1 = {https://arxiv.org/abs/2306.00978}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2306.00978}} @inproceedings{bamoumen2022tinyml, - title = {How TinyML Can be Leveraged to Solve Environmental Problems: A Survey}, - author = {Bamoumen, Hatim and Temouden, Anas and Benamar, Nabil and Chtouki, Yousra}, - year = 2022, - booktitle = {2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)}, - pages = {338--343}, + author = {Bamoumen, Hatim and Temouden, Anas and Benamar, Nabil and Chtouki, Yousra}, + booktitle = {2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)}, organization = {IEEE}, -} + pages = {338--343}, + title = {How TinyML Can be Leveraged to Solve Environmental Problems: A Survey}, + year = 2022} @article{banbury2020benchmarking, - title = {Benchmarking tinyml systems: Challenges and direction}, - author = {Banbury, Colby R and Reddi, Vijay Janapa and Lam, Max and Fu, William and Fazel, Amin and Holleman, Jeremy and Huang, Xinyuan and Hurtado, Robert and Kanter, David and Lokhmotov, Anton and others}, - year = 2020, - journal = {arXiv preprint arXiv:2003.04821}, -} + author = {Banbury, Colby R and Reddi, Vijay Janapa and Lam, Max and Fu, William and Fazel, Amin and Holleman, Jeremy and Huang, Xinyuan and Hurtado, Robert and Kanter, David and Lokhmotov, Anton and others}, + journal = {arXiv preprint arXiv:2003.04821}, + title = {Benchmarking tinyml systems: Challenges and direction}, + year = 2020} @article{bank2023autoencoders, - title = {Autoencoders}, - author = {Bank, Dor and Koenigstein, Noam and Giryes, Raja}, - year = 2023, - journal = {Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook}, - publisher = {Springer}, - pages = {353--374}, -} + author = {Bank, Dor and Koenigstein, Noam and Giryes, Raja}, + journal = {Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook}, + pages = {353--374}, + publisher = {Springer}, + title = {Autoencoders}, + year = 2023} @book{barroso2019datacenter, - title = {The datacenter as a computer: Designing warehouse-scale machines}, - author = {Barroso, Luiz Andr{\'e} and H{\"o}lzle, Urs and Ranganathan, Parthasarathy}, - year = 2019, - publisher = {Springer Nature}, -} + author = {Barroso, Luiz Andr{\'e} and H{\"o}lzle, Urs and Ranganathan, Parthasarathy}, + publisher = {Springer Nature}, + title = {The datacenter as a computer: Designing warehouse-scale machines}, + year = 2019} @article{Bender_Friedman_2018, - title = {Data statements for natural language processing: Toward mitigating system bias and enabling better science}, - author = {Bender, Emily M. and Friedman, Batya}, - year = 2018, - journal = {Transactions of the Association for Computational Linguistics}, - volume = 6, - pages = {587--604}, -} + author = {Bender, Emily M. and Friedman, Batya}, + doi = {10.1162/tacl_a_00041}, + journal = {Transactions of the Association for Computational Linguistics}, + pages = {587-604}, + title = {Data statements for natural language processing: Toward mitigating system bias and enabling better science}, + volume = 6, + year = 2018, + Bdsk-Url-1 = {https://doi.org/10.1162/tacl_a_00041}} @article{beyer2020we, - title = {Are we done with imagenet?}, - author = {Beyer, Lucas and H{\'e}naff, Olivier J and Kolesnikov, Alexander and Zhai, Xiaohua and Oord, A{\"a}ron van den}, - year = 2020, - journal = {arXiv preprint arXiv:2006.07159}, -} + author = {Beyer, Lucas and H{\'e}naff, Olivier J and Kolesnikov, Alexander and Zhai, Xiaohua and Oord, A{\"a}ron van den}, + journal = {arXiv preprint arXiv:2006.07159}, + title = {Are we done with imagenet?}, + year = 2020} @article{biggio2014pattern, - title = {Pattern recognition systems under attack: Design issues and research challenges}, - author = {Biggio, Battista and Fumera, Giorgio and Roli, Fabio}, - year = 2014, - journal = {International Journal of Pattern Recognition and Artificial Intelligence}, - publisher = {World Scientific}, - volume = 28, - number = {07}, - pages = 1460002, -} - -@article{biggs2021natively, - title = {A natively flexible 32-bit Arm microprocessor}, - author = {Biggs, John and Myers, James and Kufel, Jedrzej and Ozer, Emre and Craske, Simon and Sou, Antony and Ramsdale, Catherine and Williamson, Ken and Price, Richard and White, Scott}, - year = 2021, - journal = {Nature}, - publisher = {Nature Publishing Group UK London}, - volume = 595, - number = 7868, - pages = {532--536}, -} - -@article{binkert2011gem5, - title = {The gem5 simulator}, - author = {Binkert, Nathan and Beckmann, Bradford and Black, Gabriel and Reinhardt, Steven K and Saidi, Ali and Basu, Arkaprava and Hestness, Joel and Hower, Derek R and Krishna, Tushar and Sardashti, Somayeh and others}, - year = 2011, - journal = {ACM SIGARCH computer architecture news}, - publisher = {ACM New York, NY, USA}, - volume = 39, - number = 2, - pages = {1--7}, -} + author = {Biggio, Battista and Fumera, Giorgio and Roli, Fabio}, + journal = {International Journal of Pattern Recognition and Artificial Intelligence}, + number = {07}, + pages = 1460002, + publisher = {World Scientific}, + title = {Pattern recognition systems under attack: Design issues and research challenges}, + volume = 28, + year = 2014} @misc{blalock_what_2020, - title = {What is the {State} of {Neural} {Network} {Pruning}?}, - author = {Blalock, Davis and Ortiz, Jose Javier Gonzalez and Frankle, Jonathan and Guttag, John}, - year = 2020, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2003.03033}, - urldate = {2023-10-20}, - note = {arXiv:2003.03033 [cs, stat]}, -} - -@inproceedings{brown_language_2020, - title = {Language {Models} are {Few}-{Shot} {Learners}}, - author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario}, - year = 2020, - booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, - publisher = {Curran Associates, Inc.}, - volume = 33, - pages = {1877--1901}, - url = {https://proceedings.neurips.cc/paper\_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}, - urldate = {2023-11-07}, -} + abstract = {Neural network pruning---the task of reducing the size of a network by removing parameters---has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.}, + author = {Blalock, Davis and Ortiz, Jose Javier Gonzalez and Frankle, Jonathan and Guttag, John}, + doi = {10.48550/arXiv.2003.03033}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/MA4QGZ6E/Blalock et al. - 2020 - What is the State of Neural Network Pruning.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/8DFKG4GL/2003.html:text/html}, + keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, + month = mar, + note = {arXiv:2003.03033 [cs, stat]}, + publisher = {arXiv}, + title = {What is the {State} of {Neural} {Network} {Pruning}?}, + url = {http://arxiv.org/abs/2003.03033}, + urldate = {2023-10-20}, + year = 2020, + Bdsk-Url-1 = {http://arxiv.org/abs/2003.03033}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2003.03033}} @article{brown2020language, - title = {Language models are few-shot learners}, - author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others}, - year = 2020, - journal = {Advances in neural information processing systems}, - volume = 33, - pages = {1877--1901}, -} - -@article{burr2016recent, - title = {Recent progress in phase-change memory technology}, - author = {Burr, Geoffrey W and Brightsky, Matthew J and Sebastian, Abu and Cheng, Huai-Yu and Wu, Jau-Yi and Kim, Sangbum and Sosa, Norma E and Papandreou, Nikolaos and Lung, Hsiang-Lan and Pozidis, Haralampos and others}, - year = 2016, - journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems}, - publisher = {IEEE}, - volume = 6, - number = 2, - pages = {146--162}, -} + author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others}, + journal = {Advances in neural information processing systems}, + pages = {1877--1901}, + title = {Language models are few-shot learners}, + volume = 33, + year = 2020} @inproceedings{cai_online_2021, - title = {Online {Continual} {Learning} with {Natural} {Distribution} {Shifts}: {An} {Empirical} {Study} with {Visual} {Data}}, - author = {Cai, Zhipeng and Sener, Ozan and Koltun, Vladlen}, - year = 2021, - month = oct, - booktitle = {2021 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})}, - publisher = {IEEE}, - address = {Montreal, QC, Canada}, - pages = {8261--8270}, - isbn = {978-1-66542-812-5}, - url = {https://ieeexplore.ieee.org/document/9710740/}, - urldate = {2023-10-26}, - language = {en}, -} + address = {Montreal, QC, Canada}, + author = {Cai, Zhipeng and Sener, Ozan and Koltun, Vladlen}, + booktitle = {2021 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})}, + doi = {10.1109/ICCV48922.2021.00817}, + file = {Cai et al. - 2021 - Online Continual Learning with Natural Distributio.pdf:/Users/alex/Zotero/storage/R7ZMIM4K/Cai et al. - 2021 - Online Continual Learning with Natural Distributio.pdf:application/pdf}, + isbn = {978-1-66542-812-5}, + language = {en}, + month = oct, + pages = {8261--8270}, + publisher = {IEEE}, + shorttitle = {Online {Continual} {Learning} with {Natural} {Distribution} {Shifts}}, + title = {Online {Continual} {Learning} with {Natural} {Distribution} {Shifts}: {An} {Empirical} {Study} with {Visual} {Data}}, + url = {https://ieeexplore.ieee.org/document/9710740/}, + urldate = {2023-10-26}, + year = 2021, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/document/9710740/}, + Bdsk-Url-2 = {https://doi.org/10.1109/ICCV48922.2021.00817}} @article{cai_tinytl_nodate, - title = {{TinyTL}: {Reduce} {Memory}, {Not} {Parameters} for {Efficient} {On}-{Device} {Learning}}, - author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, - language = {en}, -} + author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, + file = {Cai et al. - TinyTL Reduce Memory, Not Parameters for Efficient.pdf:/Users/alex/Zotero/storage/J9C8PTCX/Cai et al. - TinyTL Reduce Memory, Not Parameters for Efficient.pdf:application/pdf}, + language = {en}, + title = {{TinyTL}: {Reduce} {Memory}, {Not} {Parameters} for {Efficient} {On}-{Device} {Learning}}} @article{cai2018proxylessnas, - title = {Proxylessnas: Direct neural architecture search on target task and hardware}, - author = {Cai, Han and Zhu, Ligeng and Han, Song}, - year = 2018, - journal = {arXiv preprint arXiv:1812.00332}, -} + author = {Cai, Han and Zhu, Ligeng and Han, Song}, + journal = {arXiv preprint arXiv:1812.00332}, + title = {Proxylessnas: Direct neural architecture search on target task and hardware}, + year = 2018} @article{cai2020tinytl, - title = {Tinytl: Reduce memory, not parameters for efficient on-device learning}, - author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, - year = 2020, - journal = {Advances in Neural Information Processing Systems}, - volume = 33, - pages = {11285--11297}, -} + author = {Cai, Han and Gan, Chuang and Zhu, Ligeng and Han, Song}, + journal = {Advances in Neural Information Processing Systems}, + pages = {11285--11297}, + title = {Tinytl: Reduce memory, not parameters for efficient on-device learning}, + volume = 33, + year = 2020} @article{Chapelle_Scholkopf_Zien, - title = {Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]}, - author = {Chapelle, O. and Scholkopf, B. and Zien, Eds., A.}, - year = 2009, - journal = {IEEE Transactions on Neural Networks}, - volume = 20, - number = 3, - pages = {542–542}, -} + author = {Chapelle, O. and Scholkopf, B. and Zien, Eds., A.}, + doi = {10.1109/tnn.2009.2015974}, + journal = {IEEE Transactions on Neural Networks}, + number = 3, + pages = {542--542}, + title = {Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]}, + volume = 20, + year = 2009, + Bdsk-Url-1 = {https://doi.org/10.1109/tnn.2009.2015974}} @misc{chen__inpainting_2022, - title = {Inpainting {Fluid} {Dynamics} with {Tensor} {Decomposition} ({NumPy})}, - author = {Chen (陈新宇), Xinyu}, - year = 2022, - month = mar, - journal = {Medium}, - url = {https://medium.com/\@xinyu.chen/inpainting-fluid-dynamics-with-tensor-decomposition-numpy-d84065fead4d}, - urldate = {2023-10-20}, - language = {en}, -} + abstract = {Some simple examples for showing how to use tensor decomposition to reconstruct fluid dynamics}, + author = {Chen (陈新宇), Xinyu}, + journal = {Medium}, + language = {en}, + month = mar, + title = {Inpainting {Fluid} {Dynamics} with {Tensor} {Decomposition} ({NumPy})}, + url = {https://medium.com/@xinyu.chen/inpainting-fluid-dynamics-with-tensor-decomposition-numpy-d84065fead4d}, + urldate = {2023-10-20}, + year = 2022, + Bdsk-Url-1 = {https://medium.com/@xinyu.chen/inpainting-fluid-dynamics-with-tensor-decomposition-numpy-d84065fead4d}} @misc{chen_tvm_2018, - title = {{TVM}: {An} {Automated} {End}-to-{End} {Optimizing} {Compiler} for {Deep} {Learning}}, - author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Cowan, Meghan and Shen, Haichen and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and Guestrin, Carlos and Krishnamurthy, Arvind}, - year = 2018, - month = oct, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1802.04799}, - urldate = {2023-10-26}, - note = {arXiv:1802.04799 [cs]}, - language = {en}, -} + annote = {Comment: Significantly improved version, add automated optimization}, + author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Cowan, Meghan and Shen, Haichen and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and Guestrin, Carlos and Krishnamurthy, Arvind}, + file = {Chen et al. - 2018 - TVM An Automated End-to-End Optimizing Compiler f.pdf:/Users/alex/Zotero/storage/QR8MHJ38/Chen et al. - 2018 - TVM An Automated End-to-End Optimizing Compiler f.pdf:application/pdf}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Programming Languages}, + language = {en}, + month = oct, + note = {arXiv:1802.04799 [cs]}, + publisher = {arXiv}, + shorttitle = {{TVM}}, + title = {{TVM}: {An} {Automated} {End}-to-{End} {Optimizing} {Compiler} for {Deep} {Learning}}, + url = {http://arxiv.org/abs/1802.04799}, + urldate = {2023-10-26}, + year = 2018, + Bdsk-Url-1 = {http://arxiv.org/abs/1802.04799}} @article{chen2016training, - title = {Training deep nets with sublinear memory cost}, - author = {Chen, Tianqi and Xu, Bing and Zhang, Chiyuan and Guestrin, Carlos}, - year = 2016, - journal = {arXiv preprint arXiv:1604.06174}, -} + author = {Chen, Tianqi and Xu, Bing and Zhang, Chiyuan and Guestrin, Carlos}, + journal = {arXiv preprint arXiv:1604.06174}, + title = {Training deep nets with sublinear memory cost}, + year = 2016} @inproceedings{chen2018tvm, - title = {$\{$TVM\$\}\$: An automated \$\{\$End-to-End\$\}\$ optimizing compiler for deep learning}, - author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Shen, Haichen and Cowan, Meghan and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and others}, - year = 2018, - booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)}, - pages = {578--594}, -} - -@article{Chen2023, - title = {A framework for integrating artificial intelligence for clinical care with continuous therapeutic monitoring}, - author = {Chen, Emma and Prakash, Shvetank and Janapa Reddi, Vijay and Kim, David and Rajpurkar, Pranav}, - year = 2023, - month = nov, - day = {06}, - journal = {Nature Biomedical Engineering}, - url = {https://doi.org/10.1038/s41551-023-01115-0}, -} + author = {Chen, Tianqi and Moreau, Thierry and Jiang, Ziheng and Zheng, Lianmin and Yan, Eddie and Shen, Haichen and Cowan, Meghan and Wang, Leyuan and Hu, Yuwei and Ceze, Luis and others}, + booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)}, + pages = {578--594}, + title = {$\{$TVM$\}$: An automated $\{$End-to-End$\}$ optimizing compiler for deep learning}, + year = 2018} @article{chen2023learning, - title = {Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning}, - author = {Chen, Zhiyong and Xu, Shugong}, - year = 2023, - journal = {EURASIP Journal on Audio, Speech, and Music Processing}, - publisher = {Springer}, - volume = 2023, - number = 1, - pages = 33, -} - -@article{cheng2017survey, - title = {A survey of model compression and acceleration for deep neural networks}, - author = {Cheng, Yu and Wang, Duo and Zhou, Pan and Zhang, Tao}, - year = 2017, - journal = {arXiv preprint arXiv:1710.09282}, -} - -@article{chi2016prime, - title = {Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory}, - author = {Chi, Ping and Li, Shuangchen and Xu, Cong and Zhang, Tao and Zhao, Jishen and Liu, Yongpan and Wang, Yu and Xie, Yuan}, - year = 2016, - journal = {ACM SIGARCH Computer Architecture News}, - publisher = {ACM New York, NY, USA}, - volume = 44, - number = 3, - pages = {27--39}, -} + author = {Chen, Zhiyong and Xu, Shugong}, + journal = {EURASIP Journal on Audio, Speech, and Music Processing}, + number = 1, + pages = 33, + publisher = {Springer}, + title = {Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning}, + volume = 2023, + year = 2023} @misc{chollet2015, - title = {keras}, - author = {Fran\c{c}ois Chollet}, - year = 2015, - journal = {GitHub repository}, - publisher = {GitHub}, + author = {Fran{\c c}ois Chollet}, + commit = {5bcac37}, howpublished = {\url{https://github.com/fchollet/keras}}, - commit = {5bcac37}, -} + journal = {GitHub repository}, + publisher = {GitHub}, + title = {keras}, + year = 2015} @article{chollet2018keras, - title = {Introduction to keras}, - author = {Chollet, Fran{\c{c}}ois}, - year = 2018, - journal = {March 9th}, -} + author = {Chollet, Fran{\c{c}}ois}, + journal = {March 9th}, + title = {Introduction to keras}, + year = 2018} + @inproceedings{chu2021discovering, - title = {Discovering multi-hardware mobile models via architecture search}, - author = {Chu, Grace and Arikan, Okan and Bender, Gabriel and Wang, Weijun and Brighton, Achille and Kindermans, Pieter-Jan and Liu, Hanxiao and Akin, Berkin and Gupta, Suyog and Howard, Andrew}, - year = 2021, - booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, - pages = {3022--3031}, - eprint = {2008.08178}, archiveprefix = {arXiv}, + author = {Chu, Grace and Arikan, Okan and Bender, Gabriel and Wang, Weijun and Brighton, Achille and Kindermans, Pieter-Jan and Liu, Hanxiao and Akin, Berkin and Gupta, Suyog and Howard, Andrew}, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + eprint = {2008.08178}, + pages = {3022--3031}, primaryclass = {cs.CV}, -} - -@article{chua1971memristor, - title = {Memristor-the missing circuit element}, - author = {Chua, Leon}, - year = 1971, - journal = {IEEE Transactions on circuit theory}, - publisher = {IEEE}, - volume = 18, - number = 5, - pages = {507--519}, -} + title = {Discovering multi-hardware mobile models via architecture search}, + year = 2021} @article{coleman2017dawnbench, - title = {Dawnbench: An end-to-end deep learning benchmark and competition}, - author = {Coleman, Cody and Narayanan, Deepak and Kang, Daniel and Zhao, Tian and Zhang, Jian and Nardi, Luigi and Bailis, Peter and Olukotun, Kunle and R{\'e}, Chris and Zaharia, Matei}, - year = 2017, - journal = {Training}, - volume = 100, - number = 101, - pages = 102, -} + author = {Coleman, Cody and Narayanan, Deepak and Kang, Daniel and Zhao, Tian and Zhang, Jian and Nardi, Luigi and Bailis, Peter and Olukotun, Kunle and R{\'e}, Chris and Zaharia, Matei}, + journal = {Training}, + number = 101, + pages = 102, + title = {Dawnbench: An end-to-end deep learning benchmark and competition}, + volume = 100, + year = 2017} @inproceedings{coleman2022similarity, - title = {Similarity search for efficient active learning and search of rare concepts}, - author = {Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki}, - year = 2022, - booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, - volume = 36, - number = 6, - pages = {6402--6410}, -} + author = {Coleman, Cody and Chou, Edward and Katz-Samuels, Julian and Culatana, Sean and Bailis, Peter and Berg, Alexander C and Nowak, Robert and Sumbaly, Roshan and Zaharia, Matei and Yalniz, I Zeki}, + booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, + number = 6, + pages = {6402--6410}, + title = {Similarity search for efficient active learning and search of rare concepts}, + volume = 36, + year = 2022} @misc{cottier_trends_2023, - title = {Trends in the {Dollar} {Training} {Cost} of {Machine} {Learning} {Systems}}, - author = {Cottier, Ben}, - year = 2023, - month = jan, - journal = {Epoch AI Report}, - url = {https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems}, -} - -@article{dally_evolution_2021, - title = {Evolution of the {Graphics} {Processing} {Unit} ({GPU})}, - author = {Dally, William J. and Keckler, Stephen W. and Kirk, David B.}, - year = 2021, - month = nov, - journal = {IEEE Micro}, - volume = 41, - number = 6, - pages = {42--51}, - url = {https://ieeexplore.ieee.org/document/9623445}, - urldate = {2023-11-07}, - note = {Conference Name: IEEE Micro}, -} - -@inproceedings{data_cascades, - title = {"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI}, - author = {Nithya Sambasivan and Shivani Kapania and Hannah Highfill and Diana Akrong and Praveen Kumar Paritosh and Lora Mois Aroyo}, - year = 2021, -} + author = {Cottier, Ben}, + journal = {Epoch AI Report}, + month = jan, + title = {Trends in the {Dollar} {Training} {Cost} of {Machine} {Learning} {Systems}}, + url = {https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems}, + year = 2023, + Bdsk-Url-1 = {https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems}} @misc{david_tensorflow_2021, - title = {{TensorFlow} {Lite} {Micro}: {Embedded} {Machine} {Learning} on {TinyML} {Systems}}, - author = {David, Robert and Duke, Jared and Jain, Advait and Reddi, Vijay Janapa and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Regev, Shlomi and Rhodes, Rocky and Wang, Tiezhen and Warden, Pete}, - year = 2021, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2010.08678}, - urldate = {2023-10-26}, - note = {arXiv:2010.08678 [cs]}, - language = {en}, -} + author = {David, Robert and Duke, Jared and Jain, Advait and Reddi, Vijay Janapa and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Regev, Shlomi and Rhodes, Rocky and Wang, Tiezhen and Warden, Pete}, + file = {David et al. - 2021 - TensorFlow Lite Micro Embedded Machine Learning o.pdf:/Users/alex/Zotero/storage/YCFVNEVH/David et al. - 2021 - TensorFlow Lite Micro Embedded Machine Learning o.pdf:application/pdf}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning}, + language = {en}, + month = mar, + note = {arXiv:2010.08678 [cs]}, + publisher = {arXiv}, + shorttitle = {{TensorFlow} {Lite} {Micro}}, + title = {{TensorFlow} {Lite} {Micro}: {Embedded} {Machine} {Learning} on {TinyML} {Systems}}, + url = {http://arxiv.org/abs/2010.08678}, + urldate = {2023-10-26}, + year = 2021, + Bdsk-Url-1 = {http://arxiv.org/abs/2010.08678}} @article{david2021tensorflow, - title = {Tensorflow lite micro: Embedded machine learning for tinyml systems}, - author = {David, Robert and Duke, Jared and Jain, Advait and Janapa Reddi, Vijay and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Wang, Tiezhen and others}, - year = 2021, - journal = {Proceedings of Machine Learning and Systems}, - volume = 3, - pages = {800--811}, -} - -@article{davies2018loihi, - title = {Loihi: A neuromorphic manycore processor with on-chip learning}, - author = {Davies, Mike and Srinivasa, Narayan and Lin, Tsung-Han and Chinya, Gautham and Cao, Yongqiang and Choday, Sri Harsha and Dimou, Georgios and Joshi, Prasad and Imam, Nabil and Jain, Shweta and others}, - year = 2018, - journal = {Ieee Micro}, - publisher = {IEEE}, - volume = 38, - number = 1, - pages = {82--99}, -} - -@article{davies2021advancing, - title = {Advancing neuromorphic computing with loihi: A survey of results and outlook}, - author = {Davies, Mike and Wild, Andreas and Orchard, Garrick and Sandamirskaya, Yulia and Guerra, Gabriel A Fonseca and Joshi, Prasad and Plank, Philipp and Risbud, Sumedh R}, - year = 2021, - journal = {Proceedings of the IEEE}, - publisher = {IEEE}, - volume = 109, - number = 5, - pages = {911--934}, -} - -@misc{dean_jeff_numbers_nodate, - title = {Numbers {Everyone} {Should} {Know}}, - author = {Dean. Jeff}, - url = {https://brenocon.com/dean\_perf.html}, - urldate = {2023-11-07}, -} + author = {David, Robert and Duke, Jared and Jain, Advait and Janapa Reddi, Vijay and Jeffries, Nat and Li, Jian and Kreeger, Nick and Nappier, Ian and Natraj, Meghna and Wang, Tiezhen and others}, + journal = {Proceedings of Machine Learning and Systems}, + pages = {800--811}, + title = {Tensorflow lite micro: Embedded machine learning for tinyml systems}, + volume = 3, + year = 2021} @article{dean2012large, - title = {Large scale distributed deep networks}, - author = {Dean, Jeffrey and Corrado, Greg and Monga, Rajat and Chen, Kai and Devin, Matthieu and Mao, Mark and Ranzato, Marc'aurelio and Senior, Andrew and Tucker, Paul and Yang, Ke and others}, - year = 2012, - journal = {Advances in neural information processing systems}, - volume = 25, -} + author = {Dean, Jeffrey and Corrado, Greg and Monga, Rajat and Chen, Kai and Devin, Matthieu and Mao, Mark and Ranzato, Marc'aurelio and Senior, Andrew and Tucker, Paul and Yang, Ke and others}, + journal = {Advances in neural information processing systems}, + title = {Large scale distributed deep networks}, + volume = 25, + year = 2012} @misc{deci, - title = {The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training}, - url = {https://deci.ai/quantization-and-quantization-aware-training/}, -} + title = {The Ultimate Guide to Deep Learning Model Quantization and Quantization-Aware Training}, + url = {https://deci.ai/quantization-and-quantization-aware-training/}, + Bdsk-Url-1 = {https://deci.ai/quantization-and-quantization-aware-training/}} @misc{deepcompress, - title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, - author = {Han and Mao and Dally}, - year = 2016, - url = {https://arxiv.org/abs/1510.00149}, - urldate = {2016-02-15}, -} - -@article{demler_ceva_2020, - title = {{CEVA} {SENSPRO} {FUSES} {AI} {AND} {VECTOR} {DSP}}, - author = {Demler, Mike}, - year = 2020, - language = {en}, -} + abstract = {Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9x to 13x; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3x to 4x layerwise speedup and 3x to 7x better energy efficiency.}, + author = {Han and Mao and Dally}, + doi = {10.48550/arXiv.1510.00149}, + title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, + url = {https://arxiv.org/abs/1510.00149}, + urldate = {2016-02-15}, + year = 2016, + Bdsk-Url-1 = {https://arxiv.org/abs/1510.00149}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1510.00149}} @inproceedings{deng2009imagenet, - title = {ImageNet: A large-scale hierarchical image database}, - author = {Deng, Jia and Socher, R. and Fei-Fei, Li and Dong, Wei and Li, Kai and Li, Li-Jia}, - year = 2009, - month = {06}, - booktitle = {2009 IEEE Conference on Computer Vision and Pattern Recognition(CVPR)}, - volume = 00, - pages = {248--255}, - url = {https://ieeexplore.ieee.org/abstract/document/5206848/}, - added-at = {2018-09-20T15:22:39.000+0200}, - biburl = {https://www.bibsonomy.org/bibtex/252793859f5bcbbd3f7f9e5d083160acf/analyst}, - description = {ImageNet: A large-scale hierarchical image database}, - interhash = {fbfae3e4fe1a81c477ba00efd0d4d977}, - intrahash = {52793859f5bcbbd3f7f9e5d083160acf}, - timestamp = {2018-09-20T15:22:39.000+0200}, -} + added-at = {2018-09-20T15:22:39.000+0200}, + author = {Deng, Jia and Socher, R. and Fei-Fei, Li and Dong, Wei and Li, Kai and Li, Li-Jia}, + biburl = {https://www.bibsonomy.org/bibtex/252793859f5bcbbd3f7f9e5d083160acf/analyst}, + booktitle = {2009 IEEE Conference on Computer Vision and Pattern Recognition(CVPR)}, + description = {ImageNet: A large-scale hierarchical image database}, + doi = {10.1109/CVPR.2009.5206848}, + interhash = {fbfae3e4fe1a81c477ba00efd0d4d977}, + intrahash = {52793859f5bcbbd3f7f9e5d083160acf}, + keywords = {2009 computer-vision cvpr dataset ieee paper}, + month = {06}, + pages = {248--255}, + timestamp = {2018-09-20T15:22:39.000+0200}, + title = {ImageNet: A large-scale hierarchical image database}, + url = {https://ieeexplore.ieee.org/abstract/document/5206848/}, + volume = 00, + year = 2009, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/abstract/document/5206848/}, + Bdsk-Url-2 = {https://doi.org/10.1109/CVPR.2009.5206848}} @article{desai2016five, - title = {Five Safes: designing data access for research}, - author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard and others}, - year = 2016, - journal = {Economics Working Paper Series}, - volume = 1601, - pages = 28, -} + author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard and others}, + journal = {Economics Working Paper Series}, + pages = 28, + title = {Five Safes: designing data access for research}, + volume = 1601, + year = 2016} @article{desai2020five, - title = {Five Safes: designing data access for research; 2016}, - author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard}, - year = 2020, - journal = {URL https://www2. uwe. ac. uk/faculties/bbs/Documents/1601. pdf}, -} + author = {Desai, Tanvi and Ritchie, Felix and Welpton, Richard}, + journal = {URL https://www2. uwe. ac. uk/faculties/bbs/Documents/1601. pdf}, + title = {Five Safes: designing data access for research; 2016}, + year = 2020} @article{devlin2018bert, - title = {Bert: Pre-training of deep bidirectional transformers for language understanding}, - author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, - year = 2018, - journal = {arXiv preprint arXiv:1810.04805}, -} + author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, + journal = {arXiv preprint arXiv:1810.04805}, + title = {Bert: Pre-training of deep bidirectional transformers for language understanding}, + year = 2018} @article{dhar2021survey, - title = {A survey of on-device machine learning: An algorithms and learning theory perspective}, - author = {Dhar, Sauptik and Guo, Junyao and Liu, Jiayi and Tripathi, Samarth and Kurup, Unmesh and Shah, Mohak}, - year = 2021, - journal = {ACM Transactions on Internet of Things}, - publisher = {ACM New York, NY, USA}, - volume = 2, - number = 3, - pages = {1--49}, -} + author = {Dhar, Sauptik and Guo, Junyao and Liu, Jiayi and Tripathi, Samarth and Kurup, Unmesh and Shah, Mohak}, + journal = {ACM Transactions on Internet of Things}, + number = 3, + pages = {1--49}, + publisher = {ACM New York, NY, USA}, + title = {A survey of on-device machine learning: An algorithms and learning theory perspective}, + volume = 2, + year = 2021} @misc{dong2022splitnets, - title = {SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems}, - author = {Xin Dong and Barbara De Salvo and Meng Li and Chiao Liu and Zhongnan Qu and H. T. Kung and Ziyun Li}, - year = 2022, - eprint = {2204.04705}, archiveprefix = {arXiv}, + author = {Xin Dong and Barbara De Salvo and Meng Li and Chiao Liu and Zhongnan Qu and H. T. Kung and Ziyun Li}, + eprint = {2204.04705}, primaryclass = {cs.LG}, -} - -@article{Dongarra2009-na, - title = {The evolution of high performance computing on system z}, - author = {Dongarra, Jack J}, - year = 2009, - journal = {IBM Journal of Research and Development}, - volume = 53, - pages = {3--4}, -} - -@article{duarte2022fastml, - title = {FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning}, - author = {Duarte, Javier and Tran, Nhan and Hawks, Ben and Herwig, Christian and Muhizi, Jules and Prakash, Shvetank and Reddi, Vijay Janapa}, - year = 2022, - journal = {arXiv preprint arXiv:2207.07958}, -} + title = {SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems}, + year = 2022} @article{duisterhof2019learning, - title = {Learning to seek: Autonomous source seeking with deep reinforcement learning onboard a nano drone microcontroller}, - author = {Duisterhof, Bardienus P and Krishnan, Srivatsan and Cruz, Jonathan J and Banbury, Colby R and Fu, William and Faust, Aleksandra and de Croon, Guido CHE and Reddi, Vijay Janapa}, - year = 2019, - journal = {arXiv preprint arXiv:1909.11236}, -} + author = {Duisterhof, Bardienus P and Krishnan, Srivatsan and Cruz, Jonathan J and Banbury, Colby R and Fu, William and Faust, Aleksandra and de Croon, Guido CHE and Reddi, Vijay Janapa}, + journal = {arXiv preprint arXiv:1909.11236}, + title = {Learning to seek: Autonomous source seeking with deep reinforcement learning onboard a nano drone microcontroller}, + year = 2019} @inproceedings{duisterhof2021sniffy, - title = {Sniffy bug: A fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments}, - author = {Duisterhof, Bardienus P and Li, Shushuai and Burgu{\'e}s, Javier and Reddi, Vijay Janapa and de Croon, Guido CHE}, - year = 2021, - booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, - pages = {9099--9106}, + author = {Duisterhof, Bardienus P and Li, Shushuai and Burgu{\'e}s, Javier and Reddi, Vijay Janapa and de Croon, Guido CHE}, + booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, organization = {IEEE}, -} + pages = {9099--9106}, + title = {Sniffy bug: A fully autonomous swarm of gas-seeking nano quadcopters in cluttered environments}, + year = 2021} @article{dwork2014algorithmic, - title = {The algorithmic foundations of differential privacy}, - author = {Dwork, Cynthia and Roth, Aaron and others}, - year = 2014, - journal = {Foundations and Trends{\textregistered} in Theoretical Computer Science}, - publisher = {Now Publishers, Inc.}, - volume = 9, - number = {3--4}, - pages = {211--407}, -} - -@article{el-rayis_reconfigurable_nodate, - title = {Reconfigurable {Architectures} for the {Next} {Generation} of {Mobile} {Device} {Telecommunications} {Systems}}, - author = {El-Rayis, Ahmed Osman}, - language = {en}, -} + author = {Dwork, Cynthia and Roth, Aaron and others}, + journal = {Foundations and Trends{\textregistered} in Theoretical Computer Science}, + number = {3--4}, + pages = {211--407}, + publisher = {Now Publishers, Inc.}, + title = {The algorithmic foundations of differential privacy}, + volume = 9, + year = 2014} @article{electronics12102287, - title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, - author = {Moshawrab, Mohammad and Adda, Mehdi and Bouzouane, Abdenour and Ibrahim, Hussein and Raad, Ali}, - year = 2023, - journal = {Electronics}, - volume = 12, - number = 10, - url = {https://www.mdpi.com/2079-9292/12/10/2287}, article-number = 2287, -} + author = {Moshawrab, Mohammad and Adda, Mehdi and Bouzouane, Abdenour and Ibrahim, Hussein and Raad, Ali}, + doi = {10.3390/electronics12102287}, + issn = {2079-9292}, + journal = {Electronics}, + number = 10, + title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, + url = {https://www.mdpi.com/2079-9292/12/10/2287}, + volume = 12, + year = 2023, + Bdsk-Url-1 = {https://www.mdpi.com/2079-9292/12/10/2287}, + Bdsk-Url-2 = {https://doi.org/10.3390/electronics12102287}} @misc{energyproblem, - title = {Computing's energy problem (and what we can do about it)}, - author = {ISSCC}, - year = 2014, - url = {https://ieeexplore.ieee.org/document/6757323}, - urldate = {2014-03-06}, -} + author = {ISSCC}, + title = {Computing's energy problem (and what we can do about it)}, + url = {https://ieeexplore.ieee.org/document/6757323}, + urldate = {2014-03-06}, + year = 2014, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/document/6757323}} @article{esteva2017dermatologist, - title = {Dermatologist-level classification of skin cancer with deep neural networks}, - author = {Esteva, Andre and Kuprel, Brett and Novoa, Roberto A and Ko, Justin and Swetter, Susan M and Blau, Helen M and Thrun, Sebastian}, - year = 2017, - journal = {nature}, - publisher = {Nature Publishing Group}, - volume = 542, - number = 7639, - pages = {115--118}, -} + author = {Esteva, Andre and Kuprel, Brett and Novoa, Roberto A and Ko, Justin and Swetter, Susan M and Blau, Helen M and Thrun, Sebastian}, + journal = {nature}, + number = 7639, + pages = {115--118}, + publisher = {Nature Publishing Group}, + title = {Dermatologist-level classification of skin cancer with deep neural networks}, + volume = 542, + year = 2017} @misc{fahim2021hls4ml, - title = {hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices}, - author = {Farah Fahim and Benjamin Hawks and Christian Herwig and James Hirschauer and Sergo Jindariani and Nhan Tran and Luca P. Carloni and Giuseppe Di Guglielmo and Philip Harris and Jeffrey Krupa and Dylan Rankin and Manuel Blanco Valentin and Josiah Hester and Yingyi Luo and John Mamish and Seda Orgrenci-Memik and Thea Aarrestad and Hamza Javed and Vladimir Loncar and Maurizio Pierini and Adrian Alan Pol and Sioni Summers and Javier Duarte and Scott Hauck and Shih-Chieh Hsu and Jennifer Ngadiuba and Mia Liu and Duc Hoang and Edward Kreinar and Zhenbin Wu}, - year = 2021, - eprint = {2103.05579}, archiveprefix = {arXiv}, + author = {Farah Fahim and Benjamin Hawks and Christian Herwig and James Hirschauer and Sergo Jindariani and Nhan Tran and Luca P. Carloni and Giuseppe Di Guglielmo and Philip Harris and Jeffrey Krupa and Dylan Rankin and Manuel Blanco Valentin and Josiah Hester and Yingyi Luo and John Mamish and Seda Orgrenci-Memik and Thea Aarrestad and Hamza Javed and Vladimir Loncar and Maurizio Pierini and Adrian Alan Pol and Sioni Summers and Javier Duarte and Scott Hauck and Shih-Chieh Hsu and Jennifer Ngadiuba and Mia Liu and Duc Hoang and Edward Kreinar and Zhenbin Wu}, + eprint = {2103.05579}, primaryclass = {cs.LG}, -} - -@article{farah2005neuroethics, - title = {Neuroethics: the practical and the philosophical}, - author = {Farah, Martha J}, - year = 2005, - journal = {Trends in cognitive sciences}, - publisher = {Elsevier}, - volume = 9, - number = 1, - pages = {34--40}, -} - -@inproceedings{fowers2018configurable, - title = {A configurable cloud-scale DNN processor for real-time AI}, - author = {Fowers, Jeremy and Ovtcharov, Kalin and Papamichael, Michael and Massengill, Todd and Liu, Ming and Lo, Daniel and Alkalay, Shlomi and Haselman, Michael and Adams, Logan and Ghandi, Mahdi and others}, - year = 2018, - booktitle = {2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)}, - pages = {1--14}, - organization = {IEEE}, -} + title = {hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices}, + year = 2021} @misc{frankle_lottery_2019, - title = {The {Lottery} {Ticket} {Hypothesis}: {Finding} {Sparse}, {Trainable} {Neural} {Networks}}, - author = {Frankle, Jonathan and Carbin, Michael}, - year = 2019, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1803.03635}, - urldate = {2023-10-20}, - note = {arXiv:1803.03635 [cs]}, -} - -@article{furber2016large, - title = {Large-scale neuromorphic computing systems}, - author = {Furber, Steve}, - year = 2016, - journal = {Journal of neural engineering}, - publisher = {IOP Publishing}, - volume = 13, - number = 5, - pages = {051001}, -} - -@article{gaitathome, - title = {Monitoring gait at home with radio waves in Parkinson's disease: A marker of severity, progression, and medication response}, - author = {Yingcheng Liu and Guo Zhang and Christopher G. Tarolli and Rumen Hristov and Stella Jensen-Roberts and Emma M. Waddell and Taylor L. Myers and Meghan E. Pawlik and Julia M. Soto and Renee M. Wilson and Yuzhe Yang and Timothy Nordahl and Karlo J. Lizarraga and Jamie L. Adams and Ruth B. Schneider and Karl Kieburtz and Terry Ellis and E. Ray Dorsey and Dina Katabi}, - year = 2022, - journal = {Science Translational Medicine}, - volume = 14, - number = 663, - pages = {eadc9669}, - url = {https://www.science.org/doi/abs/10.1126/scitranslmed.adc9669}, - eprint = {https://www.science.org/doi/pdf/10.1126/scitranslmed.adc9669}, -} - -@article{gale2019state, - title = {The state of sparsity in deep neural networks}, - author = {Gale, Trevor and Elsen, Erich and Hooker, Sara}, - year = 2019, - journal = {arXiv preprint arXiv:1902.09574}, -} - -@inproceedings{gannot1994verilog, - title = {Verilog HDL based FPGA design}, - author = {Gannot, G. and Ligthart, M.}, - year = 1994, - booktitle = {International Verilog HDL Conference}, - pages = {86--92}, -} - -@article{gates2009flexible, - title = {Flexible electronics}, - author = {Gates, Byron D}, - year = 2009, - journal = {Science}, - publisher = {American Association for the Advancement of Science}, - volume = 323, - number = 5921, - pages = {1566--1567}, -} + abstract = {Neural network pruning techniques can reduce the parameter counts of trained networks by over 90\%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20\% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.}, + author = {Frankle, Jonathan and Carbin, Michael}, + doi = {10.48550/arXiv.1803.03635}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/6STHYGW5/Frankle and Carbin - 2019 - The Lottery Ticket Hypothesis Finding Sparse, Tra.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/QGNSCTQB/1803.html:text/html}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing}, + month = mar, + note = {arXiv:1803.03635 [cs]}, + publisher = {arXiv}, + shorttitle = {The {Lottery} {Ticket} {Hypothesis}}, + title = {The {Lottery} {Ticket} {Hypothesis}: {Finding} {Sparse}, {Trainable} {Neural} {Networks}}, + url = {http://arxiv.org/abs/1803.03635}, + urldate = {2023-10-20}, + year = 2019, + Bdsk-Url-1 = {http://arxiv.org/abs/1803.03635}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1803.03635}} @article{gaviria2022dollar, - title = {The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World}, - author = {Gaviria Rojas, William and Diamos, Sudnya and Kini, Keertan and Kanter, David and Janapa Reddi, Vijay and Coleman, Cody}, - year = 2022, - journal = {Advances in Neural Information Processing Systems}, - volume = 35, - pages = {12979--12990}, -} + author = {Gaviria Rojas, William and Diamos, Sudnya and Kini, Keertan and Kanter, David and Janapa Reddi, Vijay and Coleman, Cody}, + journal = {Advances in Neural Information Processing Systems}, + pages = {12979--12990}, + title = {The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World}, + volume = 35, + year = 2022} @article{Gebru_Morgenstern_Vecchione_Vaughan_Wallach_III_Crawford_2021, - title = {Datasheets for datasets}, - author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and III, Hal Daum\'{e} and Crawford, Kate}, - year = 2021, - journal = {Communications of the ACM}, - volume = 64, - number = 12, - pages = {86–92}, -} - -@article{glucosemonitor, - title = {Non-invasive Monitoring of Three Glucose Ranges Based On ECG By Using DBSCAN-CNN}, - author = {Li, Jingzhen and Tobore, Igbe and Liu, Yuhang and Kandwal, Abhishek and Wang, Lei and Nie, Zedong}, - year = 2021, - journal = {IEEE Journal of Biomedical and Health Informatics}, - volume = 25, - number = 9, - pages = {3340--3350}, -} + author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and III, Hal Daum{\'e} and Crawford, Kate}, + doi = {10.1145/3458723}, + journal = {Communications of the ACM}, + number = 12, + pages = {86--92}, + title = {Datasheets for datasets}, + volume = 64, + year = 2021, + Bdsk-Url-1 = {https://doi.org/10.1145/3458723}} @article{goodfellow2020generative, - title = {Generative adversarial networks}, - author = {Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua}, - year = 2020, - journal = {Communications of the ACM}, - publisher = {ACM New York, NY, USA}, - volume = 63, - number = 11, - pages = {139--144}, -} - -@article{goodyear2017social, - title = {Social media, apps and wearable technologies: navigating ethical dilemmas and procedures}, - author = {Goodyear, Victoria A}, - year = 2017, - journal = {Qualitative research in sport, exercise and health}, - publisher = {Taylor \& Francis}, - volume = 9, - number = 3, - pages = {285--302}, -} + author = {Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua}, + journal = {Communications of the ACM}, + number = 11, + pages = {139--144}, + publisher = {ACM New York, NY, USA}, + title = {Generative adversarial networks}, + volume = 63, + year = 2020} @misc{Google, - title = {Information quality \& content moderation}, - author = {Google}, - url = {https://blog.google/documents/83/}, -} + author = {Google}, + title = {Information quality & content moderation}, + url = {https://blog.google/documents/83/}, + Bdsk-Url-1 = {https://blog.google/documents/83/}} @misc{gordon_morphnet_2018, - title = {{MorphNet}: {Fast} \& {Simple} {Resource}-{Constrained} {Structure} {Learning} of {Deep} {Networks}}, - author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, - year = 2018, - month = apr, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1711.06798}, - urldate = {2023-10-20}, - note = {arXiv:1711.06798 [cs, stat]}, -} + abstract = {We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.}, + author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, + doi = {10.48550/arXiv.1711.06798}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/GV7N4CZC/Gordon et al. - 2018 - MorphNet Fast & Simple Resource-Constrained Struc.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/K6FUV82F/1711.html:text/html}, + keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, + month = apr, + note = {arXiv:1711.06798 [cs, stat]}, + publisher = {arXiv}, + shorttitle = {{MorphNet}}, + title = {{MorphNet}: {Fast} \& {Simple} {Resource}-{Constrained} {Structure} {Learning} of {Deep} {Networks}}, + url = {http://arxiv.org/abs/1711.06798}, + urldate = {2023-10-20}, + year = 2018, + Bdsk-Url-1 = {http://arxiv.org/abs/1711.06798}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1711.06798}} @inproceedings{gordon2018morphnet, - title = {Morphnet: Fast \& simple resource-constrained structure learning of deep networks}, - author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, - year = 2018, - booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages = {1586--1595}, -} + author = {Gordon, Ariel and Eban, Elad and Nachum, Ofir and Chen, Bo and Wu, Hao and Yang, Tien-Ju and Choi, Edward}, + booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages = {1586--1595}, + title = {Morphnet: Fast \& simple resource-constrained structure learning of deep networks}, + year = 2018} @article{gruslys2016memory, - title = {Memory-efficient backpropagation through time}, - author = {Gruslys, Audrunas and Munos, R{\'e}mi and Danihelka, Ivo and Lanctot, Marc and Graves, Alex}, - year = 2016, - journal = {Advances in neural information processing systems}, - volume = 29, -} - -@article{gwennap_certus-nx_nodate, - title = {Certus-{NX} {Innovates} {General}-{Purpose} {FPGAs}}, - author = {Gwennap, Linley}, - language = {en}, -} - -@article{haensch2018next, - title = {The next generation of deep learning hardware: Analog computing}, - author = {Haensch, Wilfried and Gokmen, Tayfun and Puri, Ruchir}, - year = 2018, - journal = {Proceedings of the IEEE}, - publisher = {IEEE}, - volume = 107, - number = 1, - pages = {108--122}, -} + author = {Gruslys, Audrunas and Munos, R{\'e}mi and Danihelka, Ivo and Lanctot, Marc and Graves, Alex}, + journal = {Advances in neural information processing systems}, + title = {Memory-efficient backpropagation through time}, + volume = 29, + year = 2016} @article{han2015deep, - title = {Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding}, - author = {Han, Song and Mao, Huizi and Dally, William J}, - year = 2015, - journal = {arXiv preprint arXiv:1510.00149}, -} + author = {Han, Song and Mao, Huizi and Dally, William J}, + journal = {arXiv preprint arXiv:1510.00149}, + title = {Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding}, + year = 2015} @misc{han2016deep, - title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, - author = {Song Han and Huizi Mao and William J. Dally}, - year = 2016, - eprint = {1510.00149}, archiveprefix = {arXiv}, + author = {Song Han and Huizi Mao and William J. Dally}, + eprint = {1510.00149}, primaryclass = {cs.CV}, -} - -@article{hazan2021neuromorphic, - title = {Neuromorphic analog implementation of neural engineering framework-inspired spiking neuron for high-dimensional representation}, - author = {Hazan, Avi and Ezra Tsur, Elishai}, - year = 2021, - journal = {Frontiers in Neuroscience}, - publisher = {Frontiers Media SA}, - volume = 15, - pages = 627221, -} + title = {Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, + year = 2016} @misc{he_structured_2023, - title = {Structured {Pruning} for {Deep} {Convolutional} {Neural} {Networks}: {A} survey}, - author = {He, Yang and Xiao, Lingao}, - year = 2023, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2303.00566}, - urldate = {2023-10-20}, - note = {arXiv:2303.00566 [cs]}, -} + abstract = {The remarkable performance of deep Convolutional neural networks (CNNs) is generally attributed to their deeper and wider architectures, which can come with significant computational costs. Pruning neural networks has thus gained interest since it effectively lowers storage and computational costs. In contrast to weight pruning, which results in unstructured models, structured pruning provides the benefit of realistic acceleration by producing models that are friendly to hardware implementation. The special requirements of structured pruning have led to the discovery of numerous new challenges and the development of innovative solutions. This article surveys the recent progress towards structured pruning of deep CNNs. We summarize and compare the state-of-the-art structured pruning techniques with respect to filter ranking methods, regularization methods, dynamic execution, neural architecture search, the lottery ticket hypothesis, and the applications of pruning. While discussing structured pruning algorithms, we briefly introduce the unstructured pruning counterpart to emphasize their differences. Furthermore, we provide insights into potential research opportunities in the field of structured pruning. A curated list of neural network pruning papers can be found at https://github.com/he-y/Awesome-Pruning}, + author = {He, Yang and Xiao, Lingao}, + doi = {10.48550/arXiv.2303.00566}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/K5RGQQA9/He and Xiao - 2023 - Structured Pruning for Deep Convolutional Neural N.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/U7PVPU4C/2303.html:text/html}, + keywords = {Computer Science - Computer Vision and Pattern Recognition}, + month = mar, + note = {arXiv:2303.00566 [cs]}, + publisher = {arXiv}, + shorttitle = {Structured {Pruning} for {Deep} {Convolutional} {Neural} {Networks}}, + title = {Structured {Pruning} for {Deep} {Convolutional} {Neural} {Networks}: {A} survey}, + url = {http://arxiv.org/abs/2303.00566}, + urldate = {2023-10-20}, + year = 2023, + Bdsk-Url-1 = {http://arxiv.org/abs/2303.00566}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2303.00566}} @inproceedings{he2016deep, - title = {Deep residual learning for image recognition}, - author = {He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, - year = 2016, - booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages = {770--778}, -} + author = {He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian}, + booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages = {770--778}, + title = {Deep residual learning for image recognition}, + year = 2016} @inproceedings{hendrycks2021natural, - title = {Natural adversarial examples}, - author = {Hendrycks, Dan and Zhao, Kevin and Basart, Steven and Steinhardt, Jacob and Song, Dawn}, - year = 2021, - booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, - pages = {15262--15271}, -} - -@article{Hennessy2019-je, - title = {A new golden age for computer architecture}, - author = {Hennessy, John L and Patterson, David A}, - year = 2019, - month = jan, - journal = {Commun. ACM}, - publisher = {Association for Computing Machinery (ACM)}, - volume = 62, - number = 2, - pages = {48--60}, - copyright = {http://www.acm.org/publications/policies/copyright\_policy\#Background}, - language = {en}, -} + author = {Hendrycks, Dan and Zhao, Kevin and Basart, Steven and Steinhardt, Jacob and Song, Dawn}, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages = {15262--15271}, + title = {Natural adversarial examples}, + year = 2021} @misc{hinton_distilling_2015, - title = {Distilling the {Knowledge} in a {Neural} {Network}}, - author = {Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff}, - year = 2015, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1503.02531}, - urldate = {2023-10-20}, - note = {arXiv:1503.02531 [cs, stat]}, -} + abstract = {A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.}, + author = {Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff}, + doi = {10.48550/arXiv.1503.02531}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/VREDW45A/Hinton et al. - 2015 - Distilling the Knowledge in a Neural Network.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/8MNJG4RP/1503.html:text/html}, + keywords = {Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning}, + month = mar, + note = {arXiv:1503.02531 [cs, stat]}, + publisher = {arXiv}, + title = {Distilling the {Knowledge} in a {Neural} {Network}}, + url = {http://arxiv.org/abs/1503.02531}, + urldate = {2023-10-20}, + year = 2015, + Bdsk-Url-1 = {http://arxiv.org/abs/1503.02531}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1503.02531}} @misc{hinton2015distilling, - title = {Distilling the Knowledge in a Neural Network}, - author = {Geoffrey Hinton and Oriol Vinyals and Jeff Dean}, - year = 2015, - eprint = {1503.02531}, archiveprefix = {arXiv}, + author = {Geoffrey Hinton and Oriol Vinyals and Jeff Dean}, + eprint = {1503.02531}, primaryclass = {stat.ML}, -} + title = {Distilling the Knowledge in a Neural Network}, + year = 2015} @article{Holland_Hosny_Newman_Joseph_Chmielinski_2020, - title = {The Dataset Nutrition label}, - author = {Holland, Sarah and Hosny, Ahmed and Newman, Sarah and Joseph, Joshua and Chmielinski, Kasia}, - year = 2020, - journal = {Data Protection and Privacy}, -} + author = {Holland, Sarah and Hosny, Ahmed and Newman, Sarah and Joseph, Joshua and Chmielinski, Kasia}, + doi = {10.5040/9781509932771.ch-001}, + journal = {Data Protection and Privacy}, + title = {The Dataset Nutrition label}, + year = 2020, + Bdsk-Url-1 = {https://doi.org/10.5040/9781509932771.ch-001}} @inproceedings{hong2023publishing, - title = {Publishing Efficient On-device Models Increases Adversarial Vulnerability}, - author = {Hong, Sanghyun and Carlini, Nicholas and Kurakin, Alexey}, - year = 2023, - booktitle = {2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)}, - pages = {271--290}, + author = {Hong, Sanghyun and Carlini, Nicholas and Kurakin, Alexey}, + booktitle = {2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)}, organization = {IEEE}, -} + pages = {271--290}, + title = {Publishing Efficient On-device Models Increases Adversarial Vulnerability}, + year = 2023} @misc{howard_mobilenets_2017, - title = {{MobileNets}: {Efficient} {Convolutional} {Neural} {Networks} for {Mobile} {Vision} {Applications}}, - author = {Howard, Andrew G. and Zhu, Menglong and Chen, Bo and Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias and Andreetto, Marco and Adam, Hartwig}, - year = 2017, - month = apr, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1704.04861}, - urldate = {2023-10-20}, - note = {arXiv:1704.04861 [cs]}, -} + abstract = {We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.}, + author = {Howard, Andrew G. and Zhu, Menglong and Chen, Bo and Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias and Andreetto, Marco and Adam, Hartwig}, + doi = {10.48550/arXiv.1704.04861}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/IJ9P9ID9/Howard et al. - 2017 - MobileNets Efficient Convolutional Neural Network.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/D9TS95GJ/1704.html:text/html}, + keywords = {Computer Science - Computer Vision and Pattern Recognition}, + month = apr, + note = {arXiv:1704.04861 [cs]}, + publisher = {arXiv}, + shorttitle = {{MobileNets}}, + title = {{MobileNets}: {Efficient} {Convolutional} {Neural} {Networks} for {Mobile} {Vision} {Applications}}, + url = {http://arxiv.org/abs/1704.04861}, + urldate = {2023-10-20}, + year = 2017, + Bdsk-Url-1 = {http://arxiv.org/abs/1704.04861}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1704.04861}} @misc{howard2017mobilenets, - title = {MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications}, - author = {Andrew G. Howard and Menglong Zhu and Bo Chen and Dmitry Kalenichenko and Weijun Wang and Tobias Weyand and Marco Andreetto and Hartwig Adam}, - year = 2017, - journal = {arXiv preprint arXiv:1704.04861}, - eprint = {1704.04861}, archiveprefix = {arXiv}, + author = {Andrew G. Howard and Menglong Zhu and Bo Chen and Dmitry Kalenichenko and Weijun Wang and Tobias Weyand and Marco Andreetto and Hartwig Adam}, + eprint = {1704.04861}, + journal = {arXiv preprint arXiv:1704.04861}, primaryclass = {cs.CV}, -} - -@article{huang2010pseudo, - title = {Pseudo-CMOS: A design style for low-cost and robust flexible electronics}, - author = {Huang, Tsung-Ching and Fukuda, Kenjiro and Lo, Chun-Ming and Yeh, Yung-Hui and Sekitani, Tsuyoshi and Someya, Takao and Cheng, Kwang-Ting}, - year = 2010, - journal = {IEEE Transactions on Electron Devices}, - publisher = {IEEE}, - volume = 58, - number = 1, - pages = {141--150}, -} + title = {MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications}, + year = 2017} @misc{iandola_squeezenet_2016, - title = {{SqueezeNet}: {AlexNet}-level accuracy with 50x fewer parameters and {\textless}0.{5MB} model size}, - author = {Iandola, Forrest N. and Han, Song and Moskewicz, Matthew W. and Ashraf, Khalid and Dally, William J. and Keutzer, Kurt}, - year = 2016, - month = nov, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1602.07360}, - urldate = {2023-10-20}, - note = {arXiv:1602.07360 [cs]}, -} + abstract = {Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet}, + author = {Iandola, Forrest N. and Han, Song and Moskewicz, Matthew W. and Ashraf, Khalid and Dally, William J. and Keutzer, Kurt}, + doi = {10.48550/arXiv.1602.07360}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/X3ZX9UTZ/Iandola et al. - 2016 - SqueezeNet AlexNet-level accuracy with 50x fewer .pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/DHI96QVT/1602.html:text/html}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition}, + month = nov, + note = {arXiv:1602.07360 [cs]}, + publisher = {arXiv}, + shorttitle = {{SqueezeNet}}, + title = {{SqueezeNet}: {AlexNet}-level accuracy with 50x fewer parameters and {\textless}0.{5MB} model size}, + url = {http://arxiv.org/abs/1602.07360}, + urldate = {2023-10-20}, + year = 2016, + Bdsk-Url-1 = {http://arxiv.org/abs/1602.07360}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1602.07360}} @article{iandola2016squeezenet, - title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size}, - author = {Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt}, - year = 2016, - journal = {arXiv preprint arXiv:1602.07360}, -} - -@article{Ignatov2018-kh, - title = {{AI} Benchmark: Running deep neural networks on Android smartphones}, - author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, - year = 2018, - publisher = {arXiv}, -} + author = {Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt}, + journal = {arXiv preprint arXiv:1602.07360}, + title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size}, + year = 2016} @inproceedings{ignatov2018ai, - title = {Ai benchmark: Running deep neural networks on android smartphones}, - author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, - year = 2018, - booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, - pages = {0--0}, -} + author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, + booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops}, + pages = {0--0}, + title = {Ai benchmark: Running deep neural networks on android smartphones}, + year = 2018} @inproceedings{ijcai2021p592, - title = {Hardware-Aware Neural Architecture Search: Survey and Taxonomy}, - author = {Benmeziane, Hadjer and El Maghraoui, Kaoutar and Ouarnoughi, Hamza and Niar, Smail and Wistuba, Martin and Wang, Naigang}, - year = 2021, - month = aug, - booktitle = {Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}}, - publisher = {International Joint Conferences on Artificial Intelligence Organization}, - pages = {4322--4329}, - url = {https://doi.org/10.24963/ijcai.2021/592}, - note = {Survey Track}, - editor = {Zhi-Hua Zhou}, -} - -@inproceedings{imani2016resistive, - title = {Resistive configurable associative memory for approximate computing}, - author = {Imani, Mohsen and Rahimi, Abbas and Rosing, Tajana S}, - year = 2016, - booktitle = {2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, - pages = {1327--1332}, - organization = {IEEE}, -} + author = {Benmeziane, Hadjer and El Maghraoui, Kaoutar and Ouarnoughi, Hamza and Niar, Smail and Wistuba, Martin and Wang, Naigang}, + booktitle = {Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}}, + doi = {10.24963/ijcai.2021/592}, + editor = {Zhi-Hua Zhou}, + month = 8, + note = {Survey Track}, + pages = {4322--4329}, + publisher = {International Joint Conferences on Artificial Intelligence Organization}, + title = {Hardware-Aware Neural Architecture Search: Survey and Taxonomy}, + url = {https://doi.org/10.24963/ijcai.2021/592}, + year = 2021, + Bdsk-Url-1 = {https://doi.org/10.24963/ijcai.2021/592}} @misc{intquantfordeepinf, - title = {Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation)}, - author = {Wu and Judd, Zhang and Isaev, Micikevicius}, - year = 2020, - url = {https://arxiv.org/abs/2004.09602}, - urldate = {2020-04-20}, -} - -@inproceedings{jacob2018quantization, - title = {Quantization and training of neural networks for efficient integer-arithmetic-only inference}, - author = {Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry}, - year = 2018, - booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, - pages = {2704--2713}, -} - -@article{janapa2023edge, - title = {Edge Impulse: An MLOps Platform for Tiny Machine Learning}, - author = {Janapa Reddi, Vijay and Elium, Alexander and Hymel, Shawn and Tischler, David and Situnayake, Daniel and Ward, Carl and Moreau, Louis and Plunkett, Jenny and Kelcey, Matthew and Baaijens, Mathijs and others}, - year = 2023, - journal = {Proceedings of Machine Learning and Systems}, - volume = 5, -} - -@misc{jia_dissecting_2018, - title = {Dissecting the {NVIDIA} {Volta} {GPU} {Architecture} via {Microbenchmarking}}, - author = {Jia, Zhe and Maggioni, Marco and Staiger, Benjamin and Scarpazza, Daniele P.}, - year = 2018, - month = apr, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1804.06826}, - urldate = {2023-11-07}, - note = {arXiv:1804.06826 [cs]}, -} + author = {Wu and Judd, Zhang and Isaev, Micikevicius}, + doi = {10.48550/arXiv.2004.09602}, + title = {Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation)}, + url = {https://arxiv.org/abs/2004.09602}, + urldate = {2020-04-20}, + year = 2020, + Bdsk-Url-1 = {https://arxiv.org/abs/2004.09602}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2004.09602}} @inproceedings{jia2014caffe, - title = {Caffe: Convolutional architecture for fast feature embedding}, - author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, - year = 2014, - booktitle = {Proceedings of the 22nd ACM international conference on Multimedia}, - pages = {675--678}, -} - -@article{jia2019beyond, - title = {Beyond Data and Model Parallelism for Deep Neural Networks.}, - author = {Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, - year = 2019, - journal = {Proceedings of Machine Learning and Systems}, - volume = 1, - pages = {1--13}, -} + author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, + booktitle = {Proceedings of the 22nd ACM international conference on Multimedia}, + pages = {675--678}, + title = {Caffe: Convolutional architecture for fast feature embedding}, + year = 2014} @article{jia2023life, - title = {Life-threatening ventricular arrhythmia detection challenge in implantable cardioverter--defibrillators}, - author = {Jia, Zhenge and Li, Dawei and Xu, Xiaowei and Li, Na and Hong, Feng and Ping, Lichuan and Shi, Yiyu}, - year = 2023, - journal = {Nature Machine Intelligence}, - publisher = {Nature Publishing Group UK London}, - volume = 5, - number = 5, - pages = {554--555}, -} + author = {Jia, Zhenge and Li, Dawei and Xu, Xiaowei and Li, Na and Hong, Feng and Ping, Lichuan and Shi, Yiyu}, + journal = {Nature Machine Intelligence}, + number = 5, + pages = {554--555}, + publisher = {Nature Publishing Group UK London}, + title = {Life-threatening ventricular arrhythmia detection challenge in implantable cardioverter--defibrillators}, + volume = 5, + year = 2023} @misc{jiang2019accuracy, - title = {Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search}, - author = {Weiwen Jiang and Xinyi Zhang and Edwin H. -M. Sha and Lei Yang and Qingfeng Zhuge and Yiyu Shi and Jingtong Hu}, - year = 2019, - eprint = {1901.11211}, archiveprefix = {arXiv}, + author = {Weiwen Jiang and Xinyi Zhang and Edwin H. -M. Sha and Lei Yang and Qingfeng Zhuge and Yiyu Shi and Jingtong Hu}, + eprint = {1901.11211}, primaryclass = {cs.DC}, -} + title = {Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search}, + year = 2019} @article{Johnson-Roberson_Barto_Mehta_Sridhar_Rosaen_Vasudevan_2017, - title = {Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?}, - author = {Johnson-Roberson, Matthew and Barto, Charles and Mehta, Rounak and Sridhar, Sharath Nittur and Rosaen, Karl and Vasudevan, Ram}, - year = 2017, - journal = {2017 IEEE International Conference on Robotics and Automation (ICRA)}, -} + author = {Johnson-Roberson, Matthew and Barto, Charles and Mehta, Rounak and Sridhar, Sharath Nittur and Rosaen, Karl and Vasudevan, Ram}, + doi = {10.1109/icra.2017.7989092}, + journal = {2017 IEEE International Conference on Robotics and Automation (ICRA)}, + title = {Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?}, + year = 2017, + Bdsk-Url-1 = {https://doi.org/10.1109/icra.2017.7989092}} @article{jordan_machine_2015, - title = {Machine learning: {Trends}, perspectives, and prospects}, - author = {Jordan, M. I. and Mitchell, T. M.}, - year = 2015, - month = jul, - journal = {Science}, - volume = 349, - number = 6245, - pages = {255--260}, - url = {https://www.science.org/doi/10.1126/science.aaa8415}, - urldate = {2023-10-25}, - language = {en}, -} + author = {Jordan, M. I. and Mitchell, T. M.}, + doi = {10.1126/science.aaa8415}, + file = {Jordan and Mitchell - 2015 - Machine learning Trends, perspectives, and prospe.pdf:/Users/alex/Zotero/storage/RGU3CQ4Q/Jordan and Mitchell - 2015 - Machine learning Trends, perspectives, and prospe.pdf:application/pdf}, + issn = {0036-8075, 1095-9203}, + journal = {Science}, + language = {en}, + month = jul, + number = 6245, + pages = {255--260}, + shorttitle = {Machine learning}, + title = {Machine learning: {Trends}, perspectives, and prospects}, + url = {https://www.science.org/doi/10.1126/science.aaa8415}, + urldate = {2023-10-25}, + volume = 349, + year = 2015, + Bdsk-Url-1 = {https://www.science.org/doi/10.1126/science.aaa8415}, + Bdsk-Url-2 = {https://doi.org/10.1126/science.aaa8415}} @inproceedings{jouppi2017datacenter, - title = {In-datacenter performance analysis of a tensor processing unit}, - author = {Jouppi, Norman P and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and others}, - year = 2017, - booktitle = {Proceedings of the 44th annual international symposium on computer architecture}, - pages = {1--12}, -} - -@inproceedings{Jouppi2023TPUv4, - title = {TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings}, - author = {Jouppi, Norm and Kurian, George and Li, Sheng and Ma, Peter and Nagarajan, Rahul and Nai, Lifeng and Patil, Nishant and Subramanian, Suvinay and Swing, Andy and Towles, Brian and Young, Clifford and Zhou, Xiang and Zhou, Zongwei and Patterson, David A}, - year = 2023, - booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, - location = {Orlando, FL, USA}, - publisher = {Association for Computing Machinery}, - address = {New York, NY, USA}, - series = {ISCA '23}, - isbn = 9798400700958, - url = {https://doi.org/10.1145/3579371.3589350}, - articleno = 82, - numpages = 14, -} + author = {Jouppi, Norman P and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and others}, + booktitle = {Proceedings of the 44th annual international symposium on computer architecture}, + pages = {1--12}, + title = {In-datacenter performance analysis of a tensor processing unit}, + year = 2017} @article{kairouz2015secure, - title = {Secure multi-party differential privacy}, - author = {Kairouz, Peter and Oh, Sewoong and Viswanath, Pramod}, - year = 2015, - journal = {Advances in neural information processing systems}, - volume = 28, -} + author = {Kairouz, Peter and Oh, Sewoong and Viswanath, Pramod}, + journal = {Advances in neural information processing systems}, + title = {Secure multi-party differential privacy}, + volume = 28, + year = 2015} @article{karargyris2023federated, - title = {Federated benchmarking of medical artificial intelligence with MedPerf}, - author = {Karargyris, Alexandros and Umeton, Renato and Sheller, Micah J and Aristizabal, Alejandro and George, Johnu and Wuest, Anna and Pati, Sarthak and Kassem, Hasan and Zenk, Maximilian and Baid, Ujjwal and others}, - year = 2023, - journal = {Nature Machine Intelligence}, - publisher = {Nature Publishing Group UK London}, - volume = 5, - number = 7, - pages = {799--810}, -} + author = {Karargyris, Alexandros and Umeton, Renato and Sheller, Micah J and Aristizabal, Alejandro and George, Johnu and Wuest, Anna and Pati, Sarthak and Kassem, Hasan and Zenk, Maximilian and Baid, Ujjwal and others}, + journal = {Nature Machine Intelligence}, + number = 7, + pages = {799--810}, + publisher = {Nature Publishing Group UK London}, + title = {Federated benchmarking of medical artificial intelligence with MedPerf}, + volume = 5, + year = 2023} @article{kiela2021dynabench, - title = {Dynabench: Rethinking benchmarking in NLP}, - author = {Kiela, Douwe and Bartolo, Max and Nie, Yixin and Kaushik, Divyansh and Geiger, Atticus and Wu, Zhengxuan and Vidgen, Bertie and Prasad, Grusha and Singh, Amanpreet and Ringshia, Pratik and others}, - year = 2021, - journal = {arXiv preprint arXiv:2104.14337}, -} + author = {Kiela, Douwe and Bartolo, Max and Nie, Yixin and Kaushik, Divyansh and Geiger, Atticus and Wu, Zhengxuan and Vidgen, Bertie and Prasad, Grusha and Singh, Amanpreet and Ringshia, Pratik and others}, + journal = {arXiv preprint arXiv:2104.14337}, + title = {Dynabench: Rethinking benchmarking in NLP}, + year = 2021} @inproceedings{koh2021wilds, - title = {Wilds: A benchmark of in-the-wild distribution shifts}, - author = {Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and Xie, Sang Michael and Zhang, Marvin and Balsubramani, Akshay and Hu, Weihua and Yasunaga, Michihiro and Phillips, Richard Lanas and Gao, Irena and others}, - year = 2021, - booktitle = {International Conference on Machine Learning}, - pages = {5637--5664}, + author = {Koh, Pang Wei and Sagawa, Shiori and Marklund, Henrik and Xie, Sang Michael and Zhang, Marvin and Balsubramani, Akshay and Hu, Weihua and Yasunaga, Michihiro and Phillips, Richard Lanas and Gao, Irena and others}, + booktitle = {International Conference on Machine Learning}, organization = {PMLR}, -} + pages = {5637--5664}, + title = {Wilds: A benchmark of in-the-wild distribution shifts}, + year = 2021} @article{kolda_tensor_2009, - title = {Tensor {Decompositions} and {Applications}}, - author = {Kolda, Tamara G. and Bader, Brett W.}, - year = 2009, - month = aug, - journal = {SIAM Review}, - volume = 51, - number = 3, - pages = {455--500}, - url = {http://epubs.siam.org/doi/10.1137/07070111X}, - urldate = {2023-10-20}, - language = {en}, -} + abstract = {This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ≥ 3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors.}, + author = {Kolda, Tamara G. and Bader, Brett W.}, + doi = {10.1137/07070111X}, + file = {Kolda and Bader - 2009 - Tensor Decompositions and Applications.pdf:/Users/jeffreyma/Zotero/storage/Q7ZG2267/Kolda and Bader - 2009 - Tensor Decompositions and Applications.pdf:application/pdf}, + issn = {0036-1445, 1095-7200}, + journal = {SIAM Review}, + language = {en}, + month = aug, + number = 3, + pages = {455--500}, + title = {Tensor {Decompositions} and {Applications}}, + url = {http://epubs.siam.org/doi/10.1137/07070111X}, + urldate = {2023-10-20}, + volume = 51, + year = 2009, + Bdsk-Url-1 = {http://epubs.siam.org/doi/10.1137/07070111X}, + Bdsk-Url-2 = {https://doi.org/10.1137/07070111X}} @article{koshti2011cumulative, - title = {Cumulative sum control chart}, - author = {Koshti, VV}, - year = 2011, - journal = {International journal of physics and mathematical sciences}, - volume = 1, - number = 1, - pages = {28--32}, -} + author = {Koshti, VV}, + journal = {International journal of physics and mathematical sciences}, + number = 1, + pages = {28--32}, + title = {Cumulative sum control chart}, + volume = 1, + year = 2011} @misc{krishna2023raman, - title = {RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge}, - author = {Adithya Krishna and Srikanth Rohit Nudurupati and Chandana D G and Pritesh Dwivedi and Andr\'{e} van Schaik and Mahesh Mehendale and Chetan Singh Thakur}, - year = 2023, - eprint = {2306.06493}, archiveprefix = {arXiv}, + author = {Adithya Krishna and Srikanth Rohit Nudurupati and Chandana D G and Pritesh Dwivedi and Andr{\'e} van Schaik and Mahesh Mehendale and Chetan Singh Thakur}, + eprint = {2306.06493}, primaryclass = {cs.NE}, -} + title = {RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge}, + year = 2023} @article{krishnamoorthi2018quantizing, - title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, - author = {Krishnamoorthi, Raghuraman}, - year = 2018, - journal = {arXiv preprint arXiv:1806.08342}, -} + author = {Krishnamoorthi, Raghuraman}, + journal = {arXiv preprint arXiv:1806.08342}, + title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, + year = 2018} @article{Krishnan_Rajpurkar_Topol_2022, - title = {Self-supervised learning in medicine and Healthcare}, - author = {Krishnan, Rayan and Rajpurkar, Pranav and Topol, Eric J.}, - year = 2022, - journal = {Nature Biomedical Engineering}, - volume = 6, - number = 12, - pages = {1346–1352}, -} - -@inproceedings{krishnan2023archgym, - title = {ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design}, - author = {Krishnan, Srivatsan and Yazdanbakhsh, Amir and Prakash, Shvetank and Jabbour, Jason and Uchendu, Ikechukwu and Ghosh, Susobhan and Boroujerdian, Behzad and Richins, Daniel and Tripathy, Devashree and Faust, Aleksandra and Janapa Reddi, Vijay}, - year = 2023, - booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, - pages = {1--16}, -} + author = {Krishnan, Rayan and Rajpurkar, Pranav and Topol, Eric J.}, + doi = {10.1038/s41551-022-00914-1}, + journal = {Nature Biomedical Engineering}, + number = 12, + pages = {1346--1352}, + title = {Self-supervised learning in medicine and Healthcare}, + volume = 6, + year = 2022, + Bdsk-Url-1 = {https://doi.org/10.1038/s41551-022-00914-1}} @article{krizhevsky2012imagenet, - title = {Imagenet classification with deep convolutional neural networks}, - author = {Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E}, - year = 2012, - journal = {Advances in neural information processing systems}, - volume = 25, -} + author = {Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E}, + journal = {Advances in neural information processing systems}, + title = {Imagenet classification with deep convolutional neural networks}, + volume = 25, + year = 2012} @inproceedings{kung1979systolic, - title = {Systolic arrays (for VLSI)}, - author = {Kung, Hsiang Tsung and Leiserson, Charles E}, - year = 1979, - booktitle = {Sparse Matrix Proceedings 1978}, - volume = 1, - pages = {256--282}, + author = {Kung, Hsiang Tsung and Leiserson, Charles E}, + booktitle = {Sparse Matrix Proceedings 1978}, organization = {Society for industrial and applied mathematics Philadelphia, PA, USA}, -} + pages = {256--282}, + title = {Systolic arrays (for VLSI)}, + volume = 1, + year = 1979} @misc{kung2018packing, - title = {Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization}, - author = {H. T. Kung and Bradley McDanel and Sai Qian Zhang}, - year = 2018, - eprint = {1811.04770}, archiveprefix = {arXiv}, + author = {H. T. Kung and Bradley McDanel and Sai Qian Zhang}, + eprint = {1811.04770}, primaryclass = {cs.LG}, -} + title = {Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization}, + year = 2018} @incollection{kurkova_survey_2018, - title = {A {Survey} on {Deep} {Transfer} {Learning}}, - author = {Tan, Chuanqi and Sun, Fuchun and Kong, Tao and Zhang, Wenchang and Yang, Chao and Liu, Chunfang}, - year = 2018, - booktitle = {Artificial {Neural} {Networks} and {Machine} {Learning} – {ICANN} 2018}, - publisher = {Springer International Publishing}, - address = {Cham}, - volume = 11141, - pages = {270--279}, - isbn = {978-3-030-01423-0 978-3-030-01424-7}, - url = {http://link.springer.com/10.1007/978-3-030-01424-7\_27}, - urldate = {2023-10-26}, - note = {Series Title: Lecture Notes in Computer Science}, - language = {en}, - editor = {K\r{u}rkov\'{a}, V\v{e}ra and Manolopoulos, Yannis and Hammer, Barbara and Iliadis, Lazaros and Maglogiannis, Ilias}, -} + address = {Cham}, + author = {Tan, Chuanqi and Sun, Fuchun and Kong, Tao and Zhang, Wenchang and Yang, Chao and Liu, Chunfang}, + booktitle = {Artificial {Neural} {Networks} and {Machine} {Learning} -- {ICANN} 2018}, + doi = {10.1007/978-3-030-01424-7_27}, + editor = {K{\r u}rkov{\'a}, V{\v e}ra and Manolopoulos, Yannis and Hammer, Barbara and Iliadis, Lazaros and Maglogiannis, Ilias}, + file = {Tan et al. - 2018 - A Survey on Deep Transfer Learning.pdf:/Users/alex/Zotero/storage/5NZ36SGB/Tan et al. - 2018 - A Survey on Deep Transfer Learning.pdf:application/pdf}, + isbn = {978-3-030-01423-0 978-3-030-01424-7}, + language = {en}, + note = {Series Title: Lecture Notes in Computer Science}, + pages = {270--279}, + publisher = {Springer International Publishing}, + title = {A {Survey} on {Deep} {Transfer} {Learning}}, + url = {http://link.springer.com/10.1007/978-3-030-01424-7_27}, + urldate = {2023-10-26}, + volume = 11141, + year = 2018, + Bdsk-Url-1 = {http://link.springer.com/10.1007/978-3-030-01424-7_27}, + Bdsk-Url-2 = {https://doi.org/10.1007/978-3-030-01424-7_27}} @misc{kuzmin2022fp8, - title = {FP8 Quantization: The Power of the Exponent}, - author = {Andrey Kuzmin and Mart Van Baalen and Yuwei Ren and Markus Nagel and Jorn Peters and Tijmen Blankevoort}, - year = 2022, - eprint = {2208.09225}, archiveprefix = {arXiv}, + author = {Andrey Kuzmin and Mart Van Baalen and Yuwei Ren and Markus Nagel and Jorn Peters and Tijmen Blankevoort}, + eprint = {2208.09225}, primaryclass = {cs.LG}, -} + title = {FP8 Quantization: The Power of the Exponent}, + year = 2022} @misc{kwon_tinytrain_2023, - title = {{TinyTrain}: {Deep} {Neural} {Network} {Training} at the {Extreme} {Edge}}, - author = {Kwon, Young D. and Li, Rui and Venieris, Stylianos I. and Chauhan, Jagmohan and Lane, Nicholas D. and Mascolo, Cecilia}, - year = 2023, - month = jul, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2307.09988}, - urldate = {2023-10-26}, - note = {arXiv:2307.09988 [cs]}, - language = {en}, -} - -@article{kwon2022flexible, - title = {Flexible sensors and machine learning for heart monitoring}, - author = {Kwon, Sun Hwa and Dong, Lin}, - year = 2022, - journal = {Nano Energy}, - publisher = {Elsevier}, - pages = 107632, -} + author = {Kwon, Young D. and Li, Rui and Venieris, Stylianos I. and Chauhan, Jagmohan and Lane, Nicholas D. and Mascolo, Cecilia}, + file = {Kwon et al. - 2023 - TinyTrain Deep Neural Network Training at the Ext.pdf:/Users/alex/Zotero/storage/L2ST472U/Kwon et al. - 2023 - TinyTrain Deep Neural Network Training at the Ext.pdf:application/pdf}, + keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning}, + language = {en}, + month = jul, + note = {arXiv:2307.09988 [cs]}, + publisher = {arXiv}, + shorttitle = {{TinyTrain}}, + title = {{TinyTrain}: {Deep} {Neural} {Network} {Training} at the {Extreme} {Edge}}, + url = {http://arxiv.org/abs/2307.09988}, + urldate = {2023-10-26}, + year = 2023, + Bdsk-Url-1 = {http://arxiv.org/abs/2307.09988}} @article{kwon2023tinytrain, - title = {TinyTrain: Deep Neural Network Training at the Extreme Edge}, - author = {Kwon, Young D and Li, Rui and Venieris, Stylianos I and Chauhan, Jagmohan and Lane, Nicholas D and Mascolo, Cecilia}, - year = 2023, - journal = {arXiv preprint arXiv:2307.09988}, -} + author = {Kwon, Young D and Li, Rui and Venieris, Stylianos I and Chauhan, Jagmohan and Lane, Nicholas D and Mascolo, Cecilia}, + journal = {arXiv preprint arXiv:2307.09988}, + title = {TinyTrain: Deep Neural Network Training at the Extreme Edge}, + year = 2023} @misc{Labelbox, - journal = {Labelbox}, - url = {https://labelbox.com/}, -} + journal = {Labelbox}, + url = {https://labelbox.com/}, + Bdsk-Url-1 = {https://labelbox.com/}} @article{lai2018cmsis, - title = {Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus}, - author = {Lai, Liangzhen and Suda, Naveen and Chandra, Vikas}, - year = 2018, - journal = {arXiv preprint arXiv:1801.06601}, -} + author = {Lai, Liangzhen and Suda, Naveen and Chandra, Vikas}, + journal = {arXiv preprint arXiv:1801.06601}, + title = {Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus}, + year = 2018} @misc{lai2018cmsisnn, - title = {CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs}, - author = {Liangzhen Lai and Naveen Suda and Vikas Chandra}, - year = 2018, - eprint = {1801.06601}, archiveprefix = {arXiv}, + author = {Liangzhen Lai and Naveen Suda and Vikas Chandra}, + eprint = {1801.06601}, primaryclass = {cs.NE}, -} + title = {CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs}, + year = 2018} @inproceedings{lecun_optimal_1989, - title = {Optimal {Brain} {Damage}}, - author = {LeCun, Yann and Denker, John and Solla, Sara}, - year = 1989, - booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, - publisher = {Morgan-Kaufmann}, - volume = 2, - url = {https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html}, - urldate = {2023-10-20}, -} + abstract = {We have used information-theoretic ideas to derive a class of prac(cid:173) tical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, sev(cid:173) eral improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative informa(cid:173) tion to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.}, + author = {LeCun, Yann and Denker, John and Solla, Sara}, + booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, + file = {Full Text PDF:/Users/jeffreyma/Zotero/storage/BYHQQSST/LeCun et al. - 1989 - Optimal Brain Damage.pdf:application/pdf}, + publisher = {Morgan-Kaufmann}, + title = {Optimal {Brain} {Damage}}, + url = {https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html}, + urldate = {2023-10-20}, + volume = 2, + year = 1989, + Bdsk-Url-1 = {https://proceedings.neurips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html}} @article{lecun1989optimal, - title = {Optimal brain damage}, - author = {LeCun, Yann and Denker, John and Solla, Sara}, - year = 1989, - journal = {Advances in neural information processing systems}, - volume = 2, -} + author = {LeCun, Yann and Denker, John and Solla, Sara}, + journal = {Advances in neural information processing systems}, + title = {Optimal brain damage}, + volume = 2, + year = 1989} @article{li2014communication, - title = {Communication efficient distributed machine learning with the parameter server}, - author = {Li, Mu and Andersen, David G and Smola, Alexander J and Yu, Kai}, - year = 2014, - journal = {Advances in Neural Information Processing Systems}, - volume = 27, -} + author = {Li, Mu and Andersen, David G and Smola, Alexander J and Yu, Kai}, + journal = {Advances in Neural Information Processing Systems}, + title = {Communication efficient distributed machine learning with the parameter server}, + volume = 27, + year = 2014} @article{li2016lightrnn, - title = {LightRNN: Memory and computation-efficient recurrent neural networks}, - author = {Li, Xiang and Qin, Tao and Yang, Jian and Liu, Tie-Yan}, - year = 2016, - journal = {Advances in Neural Information Processing Systems}, - volume = 29, -} + author = {Li, Xiang and Qin, Tao and Yang, Jian and Liu, Tie-Yan}, + journal = {Advances in Neural Information Processing Systems}, + title = {LightRNN: Memory and computation-efficient recurrent neural networks}, + volume = 29, + year = 2016} @article{li2017deep, - title = {Deep reinforcement learning: An overview}, - author = {Li, Yuxi}, - year = 2017, - journal = {arXiv preprint arXiv:1701.07274}, -} + author = {Li, Yuxi}, + journal = {arXiv preprint arXiv:1701.07274}, + title = {Deep reinforcement learning: An overview}, + year = 2017} @article{li2017learning, - title = {Learning without forgetting}, - author = {Li, Zhizhong and Hoiem, Derek}, - year = 2017, - journal = {IEEE transactions on pattern analysis and machine intelligence}, - publisher = {IEEE}, - volume = 40, - number = 12, - pages = {2935--2947}, -} + author = {Li, Zhizhong and Hoiem, Derek}, + journal = {IEEE transactions on pattern analysis and machine intelligence}, + number = 12, + pages = {2935--2947}, + publisher = {IEEE}, + title = {Learning without forgetting}, + volume = 40, + year = 2017} @article{li2019edge, - title = {Edge AI: On-demand accelerating deep neural network inference via edge computing}, - author = {Li, En and Zeng, Liekang and Zhou, Zhi and Chen, Xu}, - year = 2019, - journal = {IEEE Transactions on Wireless Communications}, - publisher = {IEEE}, - volume = 19, - number = 1, - pages = {447--457}, -} - -@inproceedings{Li2020Additive, - title = {Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks}, - author = {Yuhang Li and Xin Dong and Wei Wang}, - year = 2020, - booktitle = {International Conference on Learning Representations}, - url = {https://openreview.net/forum?id=BkgXT24tDS}, -} + author = {Li, En and Zeng, Liekang and Zhou, Zhi and Chen, Xu}, + journal = {IEEE Transactions on Wireless Communications}, + number = 1, + pages = {447--457}, + publisher = {IEEE}, + title = {Edge AI: On-demand accelerating deep neural network inference via edge computing}, + volume = 19, + year = 2019} @misc{liao_can_2023, - title = {Can {Unstructured} {Pruning} {Reduce} the {Depth} in {Deep} {Neural} {Networks}?}, - author = {Liao, Zhu and Qu\'{e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, - year = 2023, - month = aug, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2308.06619}, - urldate = {2023-10-20}, - note = {arXiv:2308.06619 [cs]}, -} + abstract = {Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance. The key focus of EGP is to prioritize pruning connections in layers with low entropy, ultimately leading to their complete removal. Through extensive experiments conducted on popular models like ResNet-18 and Swin-T, our findings demonstrate that EGP effectively compresses deep neural networks while maintaining competitive performance levels. Our results not only shed light on the underlying mechanism behind the advantages of unstructured pruning, but also pave the way for further investigations into the intricate relationship between entropy, pruning techniques, and deep learning performance. The EGP algorithm and its insights hold great promise for advancing the field of network compression and optimization. The source code for EGP is released open-source.}, + author = {Liao, Zhu and Qu{\'e}tu, Victor and Nguyen, Van-Tam and Tartaglione, Enzo}, + doi = {10.48550/arXiv.2308.06619}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/V6P3XB5H/Liao et al. - 2023 - Can Unstructured Pruning Reduce the Depth in Deep .pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/WSQ4ZUH4/2308.html:text/html}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning}, + month = aug, + note = {arXiv:2308.06619 [cs]}, + publisher = {arXiv}, + title = {Can {Unstructured} {Pruning} {Reduce} the {Depth} in {Deep} {Neural} {Networks}?}, + url = {http://arxiv.org/abs/2308.06619}, + urldate = {2023-10-20}, + year = 2023, + Bdsk-Url-1 = {http://arxiv.org/abs/2308.06619}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2308.06619}} @misc{lin_-device_2022, - title = {On-{Device} {Training} {Under} {256KB} {Memory}}, - author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, - year = 2022, - month = nov, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2206.15472}, - urldate = {2023-10-26}, - note = {arXiv:2206.15472 [cs]}, - language = {en}, -} + annote = {Comment: NeurIPS 2022}, + author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, + file = {Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:/Users/alex/Zotero/storage/GMF6SWGT/Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:application/pdf}, + keywords = {Computer Science - Computer Vision and Pattern Recognition}, + language = {en}, + month = nov, + note = {arXiv:2206.15472 [cs]}, + publisher = {arXiv}, + title = {On-{Device} {Training} {Under} {256KB} {Memory}}, + url = {http://arxiv.org/abs/2206.15472}, + urldate = {2023-10-26}, + year = 2022, + Bdsk-Url-1 = {http://arxiv.org/abs/2206.15472}} @misc{lin_-device_2022-1, - title = {On-{Device} {Training} {Under} {256KB} {Memory}}, - author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, - year = 2022, - month = nov, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2206.15472}, - urldate = {2023-10-25}, - note = {arXiv:2206.15472 [cs]}, - language = {en}, -} + annote = {Comment: NeurIPS 2022}, + author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, + file = {Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:/Users/alex/Zotero/storage/DNIY32R2/Lin et al. - 2022 - On-Device Training Under 256KB Memory.pdf:application/pdf}, + keywords = {Computer Science - Computer Vision and Pattern Recognition}, + language = {en}, + month = nov, + note = {arXiv:2206.15472 [cs]}, + publisher = {arXiv}, + title = {On-{Device} {Training} {Under} {256KB} {Memory}}, + url = {http://arxiv.org/abs/2206.15472}, + urldate = {2023-10-25}, + year = 2022, + Bdsk-Url-1 = {http://arxiv.org/abs/2206.15472}} @misc{lin_mcunet_2020, - title = {{MCUNet}: {Tiny} {Deep} {Learning} on {IoT} {Devices}}, - author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Cohn, John and Gan, Chuang and Han, Song}, - year = 2020, - month = nov, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2007.10319}, - urldate = {2023-10-20}, - note = {arXiv:2007.10319 [cs]}, - language = {en}, -} + abstract = {Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.device, latency, energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro and CMSIS-NN. MCUNet is the first to achieves {\textgreater}70\% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x less Flash compared to quantized MobileNetV2 and ResNet-18. On visual\&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived. Code and models can be found here: https://tinyml.mit.edu.}, + annote = {Comment: NeurIPS 2020 (spotlight)}, + author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Cohn, John and Gan, Chuang and Han, Song}, + doi = {10.48550/arXiv.2007.10319}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/IX2JN4P9/Lin et al. - 2020 - MCUNet Tiny Deep Learning on IoT Devices.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/BAKHZ46Y/2007.html:text/html}, + keywords = {Computer Science - Computer Vision and Pattern Recognition}, + language = {en}, + month = nov, + note = {arXiv:2007.10319 [cs]}, + publisher = {arXiv}, + shorttitle = {{MCUNet}}, + title = {{MCUNet}: {Tiny} {Deep} {Learning} on {IoT} {Devices}}, + url = {http://arxiv.org/abs/2007.10319}, + urldate = {2023-10-20}, + year = 2020, + Bdsk-Url-1 = {http://arxiv.org/abs/2007.10319}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2007.10319}} @inproceedings{lin2014microsoft, - title = {Microsoft coco: Common objects in context}, - author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence}, - year = 2014, - booktitle = {Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13}, - pages = {740--755}, + author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence}, + booktitle = {Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13}, organization = {Springer}, -} + pages = {740--755}, + title = {Microsoft coco: Common objects in context}, + year = 2014} @article{lin2020mcunet, - title = {Mcunet: Tiny deep learning on iot devices}, - author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song and others}, - year = 2020, - journal = {Advances in Neural Information Processing Systems}, - volume = 33, - pages = {11711--11722}, - eprint = {2007.10319}, archiveprefix = {arXiv}, + author = {Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song and others}, + eprint = {2007.10319}, + journal = {Advances in Neural Information Processing Systems}, + pages = {11711--11722}, primaryclass = {cs.CV}, -} + title = {Mcunet: Tiny deep learning on iot devices}, + volume = 33, + year = 2020} @article{lin2022device, - title = {On-device training under 256kb memory}, - author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, - year = 2022, - journal = {Advances in Neural Information Processing Systems}, - volume = 35, - pages = {22941--22954}, -} - -@inproceedings{lin2022ondevice, - title = {On-Device Training Under 256KB Memory}, - author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, - year = 2022, - booktitle = {ArXiv}, -} - -@article{lin2023awq, - title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, - author = {Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song}, - year = 2023, - journal = {arXiv}, -} - -@article{lindholm_nvidia_2008, - title = {{NVIDIA} {Tesla}: {A} {Unified} {Graphics} and {Computing} {Architecture}}, - author = {Lindholm, Erik and Nickolls, John and Oberman, Stuart and Montrym, John}, - year = 2008, - month = mar, - journal = {IEEE Micro}, - volume = 28, - number = 2, - pages = {39--55}, - url = {https://ieeexplore.ieee.org/document/4523358}, - urldate = {2023-11-07}, - note = {Conference Name: IEEE Micro}, -} - -@article{loh20083d, - title = {3D-stacked memory architectures for multi-core processors}, - author = {Loh, Gabriel H}, - year = 2008, - journal = {ACM SIGARCH computer architecture news}, - publisher = {ACM New York, NY, USA}, - volume = 36, - number = 3, - pages = {453--464}, -} + author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, + journal = {Advances in Neural Information Processing Systems}, + pages = {22941--22954}, + title = {On-device training under 256kb memory}, + volume = 35, + year = 2022} @misc{lu_notes_2016, - title = {Notes on {Low}-rank {Matrix} {Factorization}}, - author = {Lu, Yuan and Yang, Jie}, - year = 2016, - month = may, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1507.00333}, - urldate = {2023-10-20}, - note = {arXiv:1507.00333 [cs]}, -} - -@inproceedings{luebke2008cuda, - title = {CUDA: Scalable parallel programming for high-performance scientific computing}, - author = {Luebke, David}, - year = 2008, - booktitle = {2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro}, - pages = {836--838}, -} + abstract = {Low-rank matrix factorization (MF) is an important technique in data science. The key idea of MF is that there exists latent structures in the data, by uncovering which we could obtain a compressed representation of the data. By factorizing an original matrix to low-rank matrices, MF provides a unified method for dimension reduction, clustering, and matrix completion. In this article we review several important variants of MF, including: Basic MF, Non-negative MF, Orthogonal non-negative MF. As can be told from their names, non-negative MF and orthogonal non-negative MF are variants of basic MF with non-negativity and/or orthogonality constraints. Such constraints are useful in specific senarios. In the first part of this article, we introduce, for each of these models, the application scenarios, the distinctive properties, and the optimizing method. By properly adapting MF, we can go beyond the problem of clustering and matrix completion. In the second part of this article, we will extend MF to sparse matrix compeletion, enhance matrix compeletion using various regularization methods, and make use of MF for (semi-)supervised learning by introducing latent space reinforcement and transformation. We will see that MF is not only a useful model but also as a flexible framework that is applicable for various prediction problems.}, + author = {Lu, Yuan and Yang, Jie}, + doi = {10.48550/arXiv.1507.00333}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/4QED5ZU9/Lu and Yang - 2016 - Notes on Low-rank Matrix Factorization.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/XIBZBDJQ/1507.html:text/html}, + keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Mathematics - Numerical Analysis}, + month = may, + note = {arXiv:1507.00333 [cs]}, + publisher = {arXiv}, + title = {Notes on {Low}-rank {Matrix} {Factorization}}, + url = {http://arxiv.org/abs/1507.00333}, + urldate = {2023-10-20}, + year = 2016, + Bdsk-Url-1 = {http://arxiv.org/abs/1507.00333}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1507.00333}} @article{lundberg2017unified, - title = {A unified approach to interpreting model predictions}, - author = {Lundberg, Scott M and Lee, Su-In}, - year = 2017, - journal = {Advances in neural information processing systems}, - volume = 30, -} - -@article{maass1997networks, - title = {Networks of spiking neurons: the third generation of neural network models}, - author = {Maass, Wolfgang}, - year = 1997, - journal = {Neural networks}, - publisher = {Elsevier}, - volume = 10, - number = 9, - pages = {1659--1671}, -} - -@article{markovic2020, - title = {Physics for neuromorphic computing}, - author = {Markovi{\'c}, Danijela and Mizrahi, Alice and Querlioz, Damien and Grollier, Julie}, - year = 2020, - journal = {Nature Reviews Physics}, - publisher = {Nature Publishing Group UK London}, - volume = 2, - number = 9, - pages = {499--510}, -} + author = {Lundberg, Scott M and Lee, Su-In}, + journal = {Advances in neural information processing systems}, + title = {A unified approach to interpreting model predictions}, + volume = 30, + year = 2017} @article{mattson2020mlperf, - title = {Mlperf training benchmark}, - author = {Mattson, Peter and Cheng, Christine and Diamos, Gregory and Coleman, Cody and Micikevicius, Paulius and Patterson, David and Tang, Hanlin and Wei, Gu-Yeon and Bailis, Peter and Bittorf, Victor and others}, - year = 2020, - journal = {Proceedings of Machine Learning and Systems}, - volume = 2, - pages = {336--349}, -} + author = {Mattson, Peter and Cheng, Christine and Diamos, Gregory and Coleman, Cody and Micikevicius, Paulius and Patterson, David and Tang, Hanlin and Wei, Gu-Yeon and Bailis, Peter and Bittorf, Victor and others}, + journal = {Proceedings of Machine Learning and Systems}, + pages = {336--349}, + title = {Mlperf training benchmark}, + volume = 2, + year = 2020} @inproceedings{mcmahan2017communication, - title = {Communication-efficient learning of deep networks from decentralized data}, - author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, - year = 2017, - booktitle = {Artificial intelligence and statistics}, - pages = {1273--1282}, + author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, + booktitle = {Artificial intelligence and statistics}, organization = {PMLR}, -} + pages = {1273--1282}, + title = {Communication-efficient learning of deep networks from decentralized data}, + year = 2017} @inproceedings{mcmahan2023communicationefficient, - title = {Communication-efficient learning of deep networks from decentralized data}, - author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, - year = 2017, - booktitle = {Artificial intelligence and statistics}, - pages = {1273--1282}, + author = {McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Aguera}, + booktitle = {Artificial intelligence and statistics}, organization = {PMLR}, -} - -@article{miller2000optical, - title = {Optical interconnects to silicon}, - author = {Miller, David AB}, - year = 2000, - journal = {IEEE Journal of Selected Topics in Quantum Electronics}, - publisher = {IEEE}, - volume = 6, - number = 6, - pages = {1312--1317}, -} - -@article{mittal2021survey, - title = {A survey of SRAM-based in-memory computing techniques and applications}, - author = {Mittal, Sparsh and Verma, Gaurav and Kaushik, Brajesh and Khanday, Farooq A}, - year = 2021, - journal = {Journal of Systems Architecture}, - publisher = {Elsevier}, - volume = 119, - pages = 102276, -} - -@article{modha2023neural, - title = {Neural inference at the frontier of energy, space, and time}, - author = {Modha, Dharmendra S and Akopyan, Filipp and Andreopoulos, Alexander and Appuswamy, Rathinakumar and Arthur, John V and Cassidy, Andrew S and Datta, Pallab and DeBole, Michael V and Esser, Steven K and Otero, Carlos Ortega and others}, - year = 2023, - journal = {Science}, - publisher = {American Association for the Advancement of Science}, - volume = 382, - number = 6668, - pages = {329--335}, -} + pages = {1273--1282}, + title = {Communication-efficient learning of deep networks from decentralized data}, + year = 2017} @article{moshawrab2023reviewing, - title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, - author = {Moshawrab, Mohammad and Adda, Mehdi and Bouzouane, Abdenour and Ibrahim, Hussein and Raad, Ali}, - year = 2023, - journal = {Electronics}, - publisher = {MDPI}, - volume = 12, - number = 10, - pages = 2287, -} - -@inproceedings{munshi2009opencl, - title = {The OpenCL specification}, - author = {Munshi, Aaftab}, - year = 2009, - booktitle = {2009 IEEE Hot Chips 21 Symposium (HCS)}, - pages = {1--314}, -} - -@article{musk2019integrated, - title = {An integrated brain-machine interface platform with thousands of channels}, - author = {Musk, Elon and others}, - year = 2019, - journal = {Journal of medical Internet research}, - publisher = {JMIR Publications Inc., Toronto, Canada}, - volume = 21, - number = 10, - pages = {e16194}, -} + author = {Moshawrab, Mohammad and Adda, Mehdi and Bouzouane, Abdenour and Ibrahim, Hussein and Raad, Ali}, + journal = {Electronics}, + number = 10, + pages = 2287, + publisher = {MDPI}, + title = {Reviewing Federated Learning Aggregation Algorithms; Strategies, Contributions, Limitations and Future Perspectives}, + volume = 12, + year = 2023} @inproceedings{nguyen2023re, - title = {Re-thinking Model Inversion Attacks Against Deep Neural Networks}, - author = {Nguyen, Ngoc-Bao and Chandrasegaran, Keshigeyan and Abdollahzadeh, Milad and Cheung, Ngai-Man}, - year = 2023, - booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, - pages = {16384--16393}, -} - -@misc{noauthor_amd_nodate, - title = {{AMD} {Radeon} {RX} 7000 {Series} {Desktop} {Graphics} {Cards}}, - url = {https://www.amd.com/en/graphics/radeon-rx-graphics}, - urldate = {2023-11-07}, -} + author = {Nguyen, Ngoc-Bao and Chandrasegaran, Keshigeyan and Abdollahzadeh, Milad and Cheung, Ngai-Man}, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + pages = {16384--16393}, + title = {Re-thinking Model Inversion Attacks Against Deep Neural Networks}, + year = 2023} @misc{noauthor_deep_nodate, - title = {Deep {Learning} {Model} {Compression} (ii) {\textbar} by {Ivy} {Gu} {\textbar} {Medium}}, - author = {Ivy Gu}, - year = 2023, - url = {https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453}, - urldate = {2023-10-20}, -} - -@misc{noauthor_evolution_2023, - title = {The {Evolution} of {Audio} {DSPs}}, - year = 2023, - month = oct, - journal = {audioXpress}, - url = {https://audioxpress.com/article/the-evolution-of-audio-dsps}, - urldate = {2023-11-07}, - language = {en}, -} - -@misc{noauthor_fpga_nodate, - title = {{FPGA} {Architecture} {Overview}}, - url = {https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/fpga-architecture-overview.html}, - urldate = {2023-11-07}, -} - -@misc{noauthor_google_2023, - title = {Google {Tensor} {G3}: {The} new chip that gives your {Pixel} an {AI} upgrade}, - year = 2023, - month = oct, - journal = {Google}, - url = {https://blog.google/products/pixel/google-tensor-g3-pixel-8/}, - urldate = {2023-11-07}, - language = {en-us}, -} - -@misc{noauthor_hexagon_nodate, - title = {Hexagon {DSP} {SDK} {Processor}}, - journal = {Qualcomm Developer Network}, - url = {https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor}, - urldate = {2023-11-07}, - language = {en}, -} - -@misc{noauthor_integrated_2023, - title = {Integrated circuit}, - year = 2023, - month = nov, - journal = {Wikipedia}, - url = {https://en.wikipedia.org/w/index.php?title=Integrated\_circuit\&oldid=1183537457}, - urldate = {2023-11-07}, - copyright = {Creative Commons Attribution-ShareAlike License}, - note = {Page Version ID: 1183537457}, - language = {en}, -} - -@misc{noauthor_intel_nodate, - title = {Intel\textregistered{} {Arc}\texttrademark{} {Graphics} {Overview}}, - journal = {Intel}, - url = {https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html}, - urldate = {2023-11-07}, - language = {en}, -} + author = {Ivy Gu}, + title = {Deep {Learning} {Model} {Compression} (ii) {\textbar} by {Ivy} {Gu} {\textbar} {Medium}}, + url = {https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453}, + urldate = {2023-10-20}, + year = {2023}, + Bdsk-Url-1 = {https://ivygdy.medium.com/deep-learning-model-compression-ii-546352ea9453}} @misc{noauthor_introduction_nodate, - title = {An {Introduction} to {Separable} {Convolutions} - {Analytics} {Vidhya}}, - author = {Hegde, Sumant}, - year = 2023, - url = {https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/}, - urldate = {2023-10-20}, -} + author = {Hegde, Sumant}, + title = {An {Introduction} to {Separable} {Convolutions} - {Analytics} {Vidhya}}, + url = {https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/}, + urldate = {2023-10-20}, + year = {2023}, + Bdsk-Url-1 = {https://www.analyticsvidhya.com/blog/2021/11/an-introduction-to-separable-convolutions/}} @misc{noauthor_knowledge_nodate, - title = {Knowledge {Distillation} - {Neural} {Network} {Distiller}}, - author = {IntelLabs}, - year = 2023, - url = {https://intellabs.github.io/distiller/knowledge\_distillation.html}, - urldate = {2023-10-20}, -} - -@misc{noauthor_project_nodate, - title = {Project {Catapult} - {Microsoft} {Research}}, - url = {https://www.microsoft.com/en-us/research/project/project-catapult/}, - urldate = {2023-11-07}, -} - -@misc{noauthor_what_nodate, - title = {What is an {FPGA}? {Field} {Programmable} {Gate} {Array}}, - journal = {AMD}, - url = {https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html}, - urldate = {2023-11-07}, - language = {en}, -} - -@misc{noauthor_who_nodate, - title = {Who {Invented} the {Microprocessor}? - {CHM}}, - url = {https://computerhistory.org/blog/who-invented-the-microprocessor/}, - urldate = {2023-11-07}, -} - -@inproceedings{Norman2017TPUv1, - title = {In-Datacenter Performance Analysis of a Tensor Processing Unit}, - author = {Jouppi, Norman P. and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and Boyle, Rick and Cantin, Pierre-luc and Chao, Clifford and Clark, Chris and Coriell, Jeremy and Daley, Mike and Dau, Matt and Dean, Jeffrey and Gelb, Ben and Ghaemmaghami, Tara Vazir and Gottipati, Rajendra and Gulland, William and Hagmann, Robert and Ho, C. Richard and Hogberg, Doug and Hu, John and Hundt, Robert and Hurt, Dan and Ibarz, Julian and Jaffey, Aaron and Jaworski, Alek and Kaplan, Alexander and Khaitan, Harshit and Killebrew, Daniel and Koch, Andy and Kumar, Naveen and Lacy, Steve and Laudon, James and Law, James and Le, Diemthu and Leary, Chris and Liu, Zhuyuan and Lucke, Kyle and Lundin, Alan and MacKean, Gordon and Maggiore, Adriana and Mahony, Maire and Miller, Kieran and Nagarajan, Rahul and Narayanaswami, Ravi and Ni, Ray and Nix, Kathy and Norrie, Thomas and Omernick, Mark and Penukonda, Narayana and Phelps, Andy and Ross, Jonathan and Ross, Matt and Salek, Amir and Samadiani, Emad and Severn, Chris and Sizikov, Gregory and Snelham, Matthew and Souter, Jed and Steinberg, Dan and Swing, Andy and Tan, Mercedes and Thorson, Gregory and Tian, Bo and Toma, Horia and Tuttle, Erick and Vasudevan, Vijay and Walter, Richard and Wang, Walter and Wilcox, Eric and Yoon, Doe Hyun}, - year = 2017, - booktitle = {Proceedings of the 44th Annual International Symposium on Computer Architecture}, - location = {Toronto, ON, Canada}, - publisher = {Association for Computing Machinery}, - address = {New York, NY, USA}, - series = {ISCA '17}, - pages = {1--12}, - isbn = 9781450348928, - url = {https://doi.org/10.1145/3079856.3080246}, - numpages = 12, -} - -@article{Norrie2021TPUv2_3, - title = {The Design Process for Google's Training Chips: TPUv2 and TPUv3}, - author = {Norrie, Thomas and Patil, Nishant and Yoon, Doe Hyun and Kurian, George and Li, Sheng and Laudon, James and Young, Cliff and Jouppi, Norman and Patterson, David}, - year = 2021, - journal = {IEEE Micro}, - volume = 41, - number = 2, - pages = {56--63}, -} + author = {IntelLabs}, + title = {Knowledge {Distillation} - {Neural} {Network} {Distiller}}, + url = {https://intellabs.github.io/distiller/knowledge_distillation.html}, + urldate = {2023-10-20}, + year = {2023}, + Bdsk-Url-1 = {https://intellabs.github.io/distiller/knowledge_distillation.html}} @article{Northcutt_Athalye_Mueller_2021, - title = {Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks}, - author = {Northcutt, Curtis G and Athalye, Anish and Mueller, Jonas}, - year = 2021, - month = mar, - journal = {arXiv}, -} + author = {Northcutt, Curtis G and Athalye, Anish and Mueller, Jonas}, + doi = {  https://doi.org/10.48550/arXiv.2103.14749 arXiv-issued DOI via DataCite}, + journal = {arXiv}, + month = {Mar}, + title = {Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks}, + year = {2021}, + Bdsk-Url-1 = { %20https://doi.org/10.48550/arXiv.2103.14749%20arXiv-issued%20DOI%20via%20DataCite}} @inproceedings{ooko2021tinyml, - title = {TinyML in Africa: Opportunities and challenges}, - author = {Ooko, Samson Otieno and Ogore, Marvin Muyonga and Nsenga, Jimmy and Zennaro, Marco}, - year = 2021, - booktitle = {2021 IEEE Globecom Workshops (GC Wkshps)}, - pages = {1--6}, + author = {Ooko, Samson Otieno and Ogore, Marvin Muyonga and Nsenga, Jimmy and Zennaro, Marco}, + booktitle = {2021 IEEE Globecom Workshops (GC Wkshps)}, organization = {IEEE}, -} + pages = {1--6}, + title = {TinyML in Africa: Opportunities and challenges}, + year = {2021}} @misc{ou_low_2023, - title = {Low {Rank} {Optimization} for {Efficient} {Deep} {Learning}: {Making} {A} {Balance} between {Compact} {Architecture} and {Fast} {Training}}, - author = {Ou, Xinwei and Chen, Zhangxin and Zhu, Ce and Liu, Yipeng}, - year = 2023, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2303.13635}, - urldate = {2023-10-20}, - note = {arXiv:2303.13635 [cs]}, -} + abstract = {Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.}, + author = {Ou, Xinwei and Chen, Zhangxin and Zhu, Ce and Liu, Yipeng}, + file = {arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/SPSZ2HR9/2303.html:text/html;Full Text PDF:/Users/jeffreyma/Zotero/storage/6TUEBTEX/Ou et al. - 2023 - Low Rank Optimization for Efficient Deep Learning.pdf:application/pdf}, + keywords = {Computer Science - Machine Learning}, + month = {Mar}, + note = {arXiv:2303.13635 [cs]}, + publisher = {arXiv}, + shorttitle = {Low {Rank} {Optimization} for {Efficient} {Deep} {Learning}}, + title = {Low {Rank} {Optimization} for {Efficient} {Deep} {Learning}: {Making} {A} {Balance} between {Compact} {Architecture} and {Fast} {Training}}, + url = {http://arxiv.org/abs/2303.13635}, + urldate = {2023-10-20}, + year = {2023}, + Bdsk-Url-1 = {http://arxiv.org/abs/2303.13635}} @article{pan_survey_2010, - title = {A {Survey} on {Transfer} {Learning}}, - author = {Pan, Sinno Jialin and Yang, Qiang}, - year = 2010, - month = oct, - journal = {IEEE Transactions on Knowledge and Data Engineering}, - volume = 22, - number = 10, - pages = {1345--1359}, - url = {http://ieeexplore.ieee.org/document/5288526/}, - urldate = {2023-10-25}, - language = {en}, -} + author = {Pan, Sinno Jialin and Yang, Qiang}, + doi = {10.1109/TKDE.2009.191}, + file = {Pan and Yang - 2010 - A Survey on Transfer Learning.pdf:/Users/alex/Zotero/storage/T3H8E5K8/Pan and Yang - 2010 - A Survey on Transfer Learning.pdf:application/pdf}, + issn = {1041-4347}, + journal = {IEEE Transactions on Knowledge and Data Engineering}, + language = {en}, + month = {Oct}, + number = {10}, + pages = {1345--1359}, + title = {A {Survey} on {Transfer} {Learning}}, + url = {http://ieeexplore.ieee.org/document/5288526/}, + urldate = {2023-10-25}, + volume = {22}, + year = {2010}, + Bdsk-Url-1 = {http://ieeexplore.ieee.org/document/5288526/}, + Bdsk-Url-2 = {https://doi.org/10.1109/TKDE.2009.191}} @article{pan2009survey, - title = {A survey on transfer learning}, - author = {Pan, Sinno Jialin and Yang, Qiang}, - year = 2009, - journal = {IEEE Transactions on knowledge and data engineering}, - publisher = {IEEE}, - volume = 22, - number = 10, - pages = {1345--1359}, -} + author = {Pan, Sinno Jialin and Yang, Qiang}, + journal = {IEEE Transactions on knowledge and data engineering}, + number = {10}, + pages = {1345--1359}, + publisher = {IEEE}, + title = {A survey on transfer learning}, + volume = {22}, + year = {2009}} @article{parisi_continual_2019, - title = {Continual lifelong learning with neural networks: {A} review}, - author = {Parisi, German I. and Kemker, Ronald and Part, Jose L. and Kanan, Christopher and Wermter, Stefan}, - year = 2019, - month = may, - journal = {Neural Networks}, - volume = 113, - pages = {54--71}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S0893608019300231}, - urldate = {2023-10-26}, - language = {en}, -} + author = {Parisi, German I. and Kemker, Ronald and Part, Jose L. and Kanan, Christopher and Wermter, Stefan}, + doi = {10.1016/j.neunet.2019.01.012}, + file = {Parisi et al. - 2019 - Continual lifelong learning with neural networks .pdf:/Users/alex/Zotero/storage/TCGHD5TW/Parisi et al. - 2019 - Continual lifelong learning with neural networks .pdf:application/pdf}, + issn = {08936080}, + journal = {Neural Networks}, + language = {en}, + month = {May}, + pages = {54--71}, + shorttitle = {Continual lifelong learning with neural networks}, + title = {Continual lifelong learning with neural networks: {A} review}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S0893608019300231}, + urldate = {2023-10-26}, + volume = {113}, + year = {2019}, + Bdsk-Url-1 = {https://linkinghub.elsevier.com/retrieve/pii/S0893608019300231}, + Bdsk-Url-2 = {https://doi.org/10.1016/j.neunet.2019.01.012}} @article{paszke2019pytorch, - title = {Pytorch: An imperative style, high-performance deep learning library}, - author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others}, - year = 2019, - journal = {Advances in neural information processing systems}, - volume = 32, -} - -@book{patterson2016computer, - title = {Computer organization and design ARM edition: the hardware software interface}, - author = {Patterson, David A and Hennessy, John L}, - year = 2016, - publisher = {Morgan kaufmann}, -} + author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and others}, + journal = {Advances in neural information processing systems}, + title = {Pytorch: An imperative style, high-performance deep learning library}, + volume = {32}, + year = {2019}} @misc{Perrigo_2023, - title = {OpenAI used Kenyan workers on less than \$2 per hour: Exclusive}, - author = {Perrigo, Billy}, - year = 2023, - month = jan, - journal = {Time}, - publisher = {Time}, - url = {https://time.com/6247678/openai-chatgpt-kenya-workers/}, -} - -@article{plasma, - title = {Noninvasive assessment of dofetilide plasma concentration using a deep learning (neural network) analysis of the surface electrocardiogram: A proof of concept study}, - author = {Attia, Zachi and Sugrue, Alan and Asirvatham, Samuel and Ackerman, Michael and Kapa, Suraj and Friedman, Paul and Noseworthy, Peter}, - year = 2018, - month = {08}, - journal = {PLOS ONE}, - volume = 13, - pages = {e0201059}, -} + author = {Perrigo, Billy}, + journal = {Time}, + month = {Jan}, + publisher = {Time}, + title = {OpenAI used Kenyan workers on less than $2 per hour: Exclusive}, + url = {https://time.com/6247678/openai-chatgpt-kenya-workers/}, + year = {2023}, + Bdsk-Url-1 = {https://time.com/6247678/openai-chatgpt-kenya-workers/}} @inproceedings{Prakash_2023, - title = {{CFU} Playground: Full-Stack Open-Source Framework for Tiny Machine Learning ({TinyML}) Acceleration on {FPGAs}}, - author = {Shvetank Prakash and Tim Callahan and Joseph Bushagour and Colby Banbury and Alan V. Green and Pete Warden and Tim Ansell and Vijay Janapa Reddi}, - year = 2023, - month = apr, - booktitle = {2023 {IEEE} International Symposium on Performance Analysis of Systems and Software ({ISPASS})}, - publisher = {{IEEE}}, - url = {https://doi.org/10.1109\%2Fispass57527.2023.00024}, -} + author = {Shvetank Prakash and Tim Callahan and Joseph Bushagour and Colby Banbury and Alan V. Green and Pete Warden and Tim Ansell and Vijay Janapa Reddi}, + booktitle = {2023 {IEEE} International Symposium on Performance Analysis of Systems and Software ({ISPASS})}, + doi = {10.1109/ispass57527.2023.00024}, + month = {apr}, + publisher = {{IEEE}}, + title = {{CFU} Playground: Full-Stack Open-Source Framework for Tiny Machine Learning ({TinyML}) Acceleration on {FPGAs}}, + url = {https://doi.org/10.1109%2Fispass57527.2023.00024}, + year = {2023}, + Bdsk-Url-1 = {https://doi.org/10.1109%2Fispass57527.2023.00024}, + Bdsk-Url-2 = {https://doi.org/10.1109/ispass57527.2023.00024}} @inproceedings{prakash_cfu_2023, - title = {{CFU} {Playground}: {Full}-{Stack} {Open}-{Source} {Framework} for {Tiny} {Machine} {Learning} ({tinyML}) {Acceleration} on {FPGAs}}, - author = {Prakash, Shvetank and Callahan, Tim and Bushagour, Joseph and Banbury, Colby and Green, Alan V. and Warden, Pete and Ansell, Tim and Reddi, Vijay Janapa}, - year = 2023, - month = apr, - booktitle = {2023 {IEEE} {International} {Symposium} on {Performance} {Analysis} of {Systems} and {Software} ({ISPASS})}, - pages = {157--167}, - url = {http://arxiv.org/abs/2201.01863}, - urldate = {2023-10-25}, - note = {arXiv:2201.01863 [cs]}, - language = {en}, -} + author = {Prakash, Shvetank and Callahan, Tim and Bushagour, Joseph and Banbury, Colby and Green, Alan V. and Warden, Pete and Ansell, Tim and Reddi, Vijay Janapa}, + booktitle = {2023 {IEEE} {International} {Symposium} on {Performance} {Analysis} of {Systems} and {Software} ({ISPASS})}, + doi = {10.1109/ISPASS57527.2023.00024}, + file = {Prakash et al. - 2023 - CFU Playground Full-Stack Open-Source Framework f.pdf:/Users/alex/Zotero/storage/BZNRIDTL/Prakash et al. - 2023 - CFU Playground Full-Stack Open-Source Framework f.pdf:application/pdf}, + keywords = {Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Hardware Architecture}, + language = {en}, + month = {Apr}, + note = {arXiv:2201.01863 [cs]}, + pages = {157--167}, + shorttitle = {{CFU} {Playground}}, + title = {{CFU} {Playground}: {Full}-{Stack} {Open}-{Source} {Framework} for {Tiny} {Machine} {Learning} ({tinyML}) {Acceleration} on {FPGAs}}, + url = {http://arxiv.org/abs/2201.01863}, + urldate = {2023-10-25}, + year = {2023}, + Bdsk-Url-1 = {http://arxiv.org/abs/2201.01863}, + Bdsk-Url-2 = {https://doi.org/10.1109/ISPASS57527.2023.00024}} @article{preparednesspublic, - title = {Public Health Law}, - author = {Preparedness, Emergency}, -} + author = {Preparedness, Emergency}, + title = {Public Health Law}} @article{Pushkarna_Zaldivar_Kjartansson_2022, - title = {Data cards: Purposeful and transparent dataset documentation for responsible ai}, - author = {Pushkarna, Mahima and Zaldivar, Andrew and Kjartansson, Oddur}, - year = 2022, - journal = {2022 ACM Conference on Fairness, Accountability, and Transparency}, -} - -@article{putnam_reconfigurable_2014, - title = {A reconfigurable fabric for accelerating large-scale datacenter services}, - author = {Putnam, Andrew and Caulfield, Adrian M. and Chung, Eric S. and Chiou, Derek and Constantinides, Kypros and Demme, John and Esmaeilzadeh, Hadi and Fowers, Jeremy and Gopal, Gopi Prashanth and Gray, Jan and Haselman, Michael and Hauck, Scott and Heil, Stephen and Hormati, Amir and Kim, Joo-Young and Lanka, Sitaram and Larus, James and Peterson, Eric and Pope, Simon and Smith, Aaron and Thong, Jason and Xiao, Phillip Yi and Burger, Doug}, - year = 2014, - month = oct, - journal = {ACM SIGARCH Computer Architecture News}, - volume = 42, - number = 3, - pages = {13--24}, - url = {https://dl.acm.org/doi/10.1145/2678373.2665678}, - urldate = {2023-11-07}, - language = {en}, -} + author = {Pushkarna, Mahima and Zaldivar, Andrew and Kjartansson, Oddur}, + doi = {10.1145/3531146.3533231}, + journal = {2022 ACM Conference on Fairness, Accountability, and Transparency}, + title = {Data cards: Purposeful and transparent dataset documentation for responsible ai}, + year = {2022}, + Bdsk-Url-1 = {https://doi.org/10.1145/3531146.3533231}} @article{qi_efficient_2021, - title = {An efficient pruning scheme of deep neural networks for {Internet} of {Things} applications}, - author = {Qi, Chen and Shen, Shibo and Li, Rongpeng and Zhifeng, Zhao and Liu, Qing and Liang, Jing and Zhang, Honggang}, - year = 2021, - month = jun, - journal = {EURASIP Journal on Advances in Signal Processing}, - volume = 2021, -} + abstract = {Nowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2\% and 94.1\%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.}, + author = {Qi, Chen and Shen, Shibo and Li, Rongpeng and Zhifeng, Zhao and Liu, Qing and Liang, Jing and Zhang, Honggang}, + doi = {10.1186/s13634-021-00744-4}, + file = {Full Text PDF:/Users/jeffreyma/Zotero/storage/AGWCC5VS/Qi et al. - 2021 - An efficient pruning scheme of deep neural network.pdf:application/pdf}, + journal = {EURASIP Journal on Advances in Signal Processing}, + month = {Jun}, + title = {An efficient pruning scheme of deep neural networks for {Internet} of {Things} applications}, + volume = 2021, + year = {2021}, + Bdsk-Url-1 = {https://doi.org/10.1186/s13634-021-00744-4}} @misc{quantdeep, - title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, - author = {Krishnamoorthi}, - year = 2018, - month = jun, - publisher = {arXiv}, - url = {https://arxiv.org/abs/1806.08342}, - urldate = {2018-06-21}, -} - -@inproceedings{raina_large-scale_2009, - title = {Large-scale deep unsupervised learning using graphics processors}, - author = {Raina, Rajat and Madhavan, Anand and Ng, Andrew Y.}, - year = 2009, - month = jun, - booktitle = {Proceedings of the 26th {Annual} {International} {Conference} on {Machine} {Learning}}, - publisher = {ACM}, - address = {Montreal Quebec Canada}, - pages = {873--880}, - isbn = {978-1-60558-516-1}, - url = {https://dl.acm.org/doi/10.1145/1553374.1553486}, - urldate = {2023-11-07}, - language = {en}, -} + author = {Krishnamoorthi}, + doi = {10.48550/arXiv.1806.08342}, + month = jun, + publisher = {arXiv}, + title = {Quantizing deep convolutional networks for efficient inference: A whitepaper}, + url = {https://arxiv.org/abs/1806.08342}, + urldate = {2018-06-21}, + year = 2018, + Bdsk-Url-1 = {https://arxiv.org/abs/1806.08342}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1806.08342}} @article{ramcharan2017deep, - title = {Deep learning for image-based cassava disease detection}, - author = {Ramcharan, Amanda and Baranowski, Kelsee and McCloskey, Peter and Ahmed, Babuali and Legg, James and Hughes, David P}, - year = 2017, - journal = {Frontiers in plant science}, - publisher = {Frontiers Media SA}, - volume = 8, - pages = 1852, -} - -@article{Ranganathan2011-dc, - title = {From microprocessors to nanostores: Rethinking data-centric systems}, - author = {Ranganathan, Parthasarathy}, - year = 2011, - month = jan, - journal = {Computer (Long Beach Calif.)}, - publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, - volume = 44, - number = 1, - pages = {39--48}, -} + author = {Ramcharan, Amanda and Baranowski, Kelsee and McCloskey, Peter and Ahmed, Babuali and Legg, James and Hughes, David P}, + journal = {Frontiers in plant science}, + pages = 1852, + publisher = {Frontiers Media SA}, + title = {Deep learning for image-based cassava disease detection}, + volume = 8, + year = 2017} @misc{Rao_2021, - author = {Rao, Ravi}, - year = 2021, - month = dec, - journal = {www.wevolver.com}, - url = {https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies}, -} - -@article{Ratner_Hancock_Dunnmon_Goldman_Ré_2018, - title = {Snorkel metal: Weak supervision for multi-task learning.}, - author = {Ratner, Alex and Hancock, Braden and Dunnmon, Jared and Goldman, Roger and R\'{e}, Christopher}, - year = 2018, - journal = {Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning}, -} + author = {Rao, Ravi}, + journal = {www.wevolver.com}, + month = {Dec}, + url = {https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies}, + year = 2021, + Bdsk-Url-1 = {https://www.wevolver.com/article/tinyml-unlocks-new-possibilities-for-sustainable-development-technologies}} @inproceedings{reddi2020mlperf, - title = {Mlperf inference benchmark}, - author = {Reddi, Vijay Janapa and Cheng, Christine and Kanter, David and Mattson, Peter and Schmuelling, Guenther and Wu, Carole-Jean and Anderson, Brian and Breughe, Maximilien and Charlebois, Mark and Chou, William and others}, - year = 2020, - booktitle = {2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)}, - pages = {446--459}, + author = {Reddi, Vijay Janapa and Cheng, Christine and Kanter, David and Mattson, Peter and Schmuelling, Guenther and Wu, Carole-Jean and Anderson, Brian and Breughe, Maximilien and Charlebois, Mark and Chou, William and others}, + booktitle = {2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)}, organization = {IEEE}, -} + pages = {446--459}, + title = {Mlperf inference benchmark}, + year = 2020} @inproceedings{ribeiro2016should, - title = {" Why should i trust you?" Explaining the predictions of any classifier}, - author = {Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos}, - year = 2016, - booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, - pages = {1135--1144}, -} + author = {Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos}, + booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, + pages = {1135--1144}, + title = {" Why should i trust you?" Explaining the predictions of any classifier}, + year = 2016} @book{rosenblatt1957perceptron, - title = {The perceptron, a perceiving and recognizing automaton Project Para}, - author = {Rosenblatt, Frank}, - year = 1957, - publisher = {Cornell Aeronautical Laboratory}, -} - -@article{roskies2002neuroethics, - title = {Neuroethics for the new millenium}, - author = {Roskies, Adina}, - year = 2002, - journal = {Neuron}, - publisher = {Elsevier}, - volume = 35, - number = 1, - pages = {21--23}, -} + author = {Rosenblatt, Frank}, + publisher = {Cornell Aeronautical Laboratory}, + title = {The perceptron, a perceiving and recognizing automaton Project Para}, + year = 1957} @inproceedings{rouhani2017tinydl, - title = {TinyDL: Just-in-time deep learning solution for constrained embedded systems}, - author = {Rouhani, Bita and Mirhoseini, Azalia and Koushanfar, Farinaz}, - year = 2017, - month = {05}, - pages = {1--4}, -} + author = {Rouhani, Bita and Mirhoseini, Azalia and Koushanfar, Farinaz}, + doi = {10.1109/ISCAS.2017.8050343}, + month = {05}, + pages = {1--4}, + title = {TinyDL: Just-in-time deep learning solution for constrained embedded systems}, + year = 2017, + Bdsk-Url-1 = {https://doi.org/10.1109/ISCAS.2017.8050343}} @article{rumelhart1986learning, - title = {Learning representations by back-propagating errors}, - author = {Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J}, - year = 1986, - journal = {nature}, - publisher = {Nature Publishing Group UK London}, - volume = 323, - number = 6088, - pages = {533--536}, -} + author = {Rumelhart, David E and Hinton, Geoffrey E and Williams, Ronald J}, + journal = {nature}, + number = 6088, + pages = {533--536}, + publisher = {Nature Publishing Group UK London}, + title = {Learning representations by back-propagating errors}, + volume = 323, + year = 1986} @article{ruvolo_ella_nodate, - title = {{ELLA}: {An} {Efficient} {Lifelong} {Learning} {Algorithm}}, - author = {Ruvolo, Paul and Eaton, Eric}, - language = {en}, -} - -@article{samajdar2018scale, - title = {Scale-sim: Systolic cnn accelerator simulator}, - author = {Samajdar, Ananda and Zhu, Yuhao and Whatmough, Paul and Mattina, Matthew and Krishna, Tushar}, - year = 2018, - journal = {arXiv preprint arXiv:1811.02883}, -} + author = {Ruvolo, Paul and Eaton, Eric}, + file = {Ruvolo and Eaton - ELLA An Efficient Lifelong Learning Algorithm.pdf:/Users/alex/Zotero/storage/QA5G29GL/Ruvolo and Eaton - ELLA An Efficient Lifelong Learning Algorithm.pdf:application/pdf}, + language = {en}, + title = {{ELLA}: {An} {Efficient} {Lifelong} {Learning} {Algorithm}}} @misc{ScaleAI, - journal = {ScaleAI}, - url = {https://scale.com/data-engine}, -} - -@article{schuman2022, - title = {Opportunities for neuromorphic computing algorithms and applications}, - author = {Schuman, Catherine D and Kulkarni, Shruti R and Parsa, Maryam and Mitchell, J Parker and Date, Prasanna and Kay, Bill}, - year = 2022, - journal = {Nature Computational Science}, - publisher = {Nature Publishing Group US New York}, - volume = 2, - number = 1, - pages = {10--19}, -} + journal = {ScaleAI}, + url = {https://scale.com/data-engine}, + Bdsk-Url-1 = {https://scale.com/data-engine}} @inproceedings{schwarzschild2021just, - title = {Just how toxic is data poisoning? a unified benchmark for backdoor and data poisoning attacks}, - author = {Schwarzschild, Avi and Goldblum, Micah and Gupta, Arjun and Dickerson, John P and Goldstein, Tom}, - year = 2021, - booktitle = {International Conference on Machine Learning}, - pages = {9389--9398}, + author = {Schwarzschild, Avi and Goldblum, Micah and Gupta, Arjun and Dickerson, John P and Goldstein, Tom}, + booktitle = {International Conference on Machine Learning}, organization = {PMLR}, -} - -@article{sculley2015hidden, - title = {Hidden technical debt in machine learning systems}, - author = {Sculley, David and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan}, - year = 2015, - journal = {Advances in neural information processing systems}, - volume = 28, -} + pages = {9389--9398}, + title = {Just how toxic is data poisoning? a unified benchmark for backdoor and data poisoning attacks}, + year = 2021} @misc{see_compression_2016, - title = {Compression of {Neural} {Machine} {Translation} {Models} via {Pruning}}, - author = {See, Abigail and Luong, Minh-Thang and Manning, Christopher D.}, - year = 2016, - month = jun, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1606.09274}, - urldate = {2023-10-20}, - note = {arXiv:1606.09274 [cs]}, -} - -@misc{segal1999opengl, - title = {The OpenGL graphics system: A specification (version 1.1)}, - author = {Segal, Mark and Akeley, Kurt}, - year = 1999, -} - -@article{segura2018ethical, - title = {Ethical implications of user perceptions of wearable devices}, - author = {Segura Anaya, LH and Alsadoon, Abeer and Costadopoulos, Nectar and Prasad, PWC}, - year = 2018, - journal = {Science and engineering ethics}, - publisher = {Springer}, - volume = 24, - pages = {1--28}, -} + abstract = {Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40\% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80\%-pruned model.}, + author = {See, Abigail and Luong, Minh-Thang and Manning, Christopher D.}, + doi = {10.48550/arXiv.1606.09274}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/2CJ4TSNR/See et al. - 2016 - Compression of Neural Machine Translation Models v.pdf:application/pdf}, + keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Neural and Evolutionary Computing}, + month = jun, + note = {arXiv:1606.09274 [cs]}, + publisher = {arXiv}, + title = {Compression of {Neural} {Machine} {Translation} {Models} via {Pruning}}, + url = {http://arxiv.org/abs/1606.09274}, + urldate = {2023-10-20}, + year = 2016, + Bdsk-Url-1 = {http://arxiv.org/abs/1606.09274}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1606.09274}} @inproceedings{seide2016cntk, - title = {CNTK: Microsoft's open-source deep-learning toolkit}, - author = {Seide, Frank and Agarwal, Amit}, - year = 2016, - booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, - pages = {2135--2135}, -} + author = {Seide, Frank and Agarwal, Amit}, + booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining}, + pages = {2135--2135}, + title = {CNTK: Microsoft's open-source deep-learning toolkit}, + year = 2016} @misc{sevilla_compute_2022, - title = {Compute {Trends} {Across} {Three} {Eras} of {Machine} {Learning}}, - author = {Sevilla, Jaime and Heim, Lennart and Ho, Anson and Besiroglu, Tamay and Hobbhahn, Marius and Villalobos, Pablo}, - year = 2022, - month = mar, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2202.05924}, - urldate = {2023-10-25}, - note = {arXiv:2202.05924 [cs]}, - language = {en}, -} + author = {Sevilla, Jaime and Heim, Lennart and Ho, Anson and Besiroglu, Tamay and Hobbhahn, Marius and Villalobos, Pablo}, + file = {Sevilla et al. - 2022 - Compute Trends Across Three Eras of Machine Learni.pdf:/Users/alex/Zotero/storage/24N9RZ72/Sevilla et al. - 2022 - Compute Trends Across Three Eras of Machine Learni.pdf:application/pdf}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society}, + language = {en}, + month = mar, + note = {arXiv:2202.05924 [cs]}, + publisher = {arXiv}, + title = {Compute {Trends} {Across} {Three} {Eras} of {Machine} {Learning}}, + url = {http://arxiv.org/abs/2202.05924}, + urldate = {2023-10-25}, + year = 2022, + Bdsk-Url-1 = {http://arxiv.org/abs/2202.05924}} @article{seyedzadeh2018machine, - title = {Machine learning for estimation of building energy consumption and performance: a review}, - author = {Seyedzadeh, Saleh and Rahimian, Farzad Pour and Glesk, Ivan and Roper, Marc}, - year = 2018, - journal = {Visualization in Engineering}, - publisher = {Springer}, - volume = 6, - pages = {1--20}, -} + author = {Seyedzadeh, Saleh and Rahimian, Farzad Pour and Glesk, Ivan and Roper, Marc}, + journal = {Visualization in Engineering}, + pages = {1--20}, + publisher = {Springer}, + title = {Machine learning for estimation of building energy consumption and performance: a review}, + volume = 6, + year = 2018} @article{shamir1979share, - title = {How to share a secret}, - author = {Shamir, Adi}, - year = 1979, - journal = {Communications of the ACM}, - publisher = {ACm New York, NY, USA}, - volume = 22, - number = 11, - pages = {612--613}, -} - -@article{shastri2021photonics, - title = {Photonics for artificial intelligence and neuromorphic computing}, - author = {Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, - year = 2021, - journal = {Nature Photonics}, - publisher = {Nature Publishing Group UK London}, - volume = 15, - number = 2, - pages = {102--114}, -} + author = {Shamir, Adi}, + journal = {Communications of the ACM}, + number = 11, + pages = {612--613}, + publisher = {ACm New York, NY, USA}, + title = {How to share a secret}, + volume = 22, + year = 1979} @article{Sheng_Zhang_2019, - title = {Machine learning with crowdsourcing: A brief summary of the past research and Future Directions}, - author = {Sheng, Victor S. and Zhang, Jing}, - year = 2019, - journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, - volume = 33, - number = {01}, - pages = {9837–9843}, -} + author = {Sheng, Victor S. and Zhang, Jing}, + doi = {10.1609/aaai.v33i01.33019837}, + journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, + number = {01}, + pages = {9837--9843}, + title = {Machine learning with crowdsourcing: A brief summary of the past research and Future Directions}, + volume = 33, + year = 2019, + Bdsk-Url-1 = {https://doi.org/10.1609/aaai.v33i01.33019837}} @misc{Sheth_2022, - title = {Eletect - TinyML and IOT based Smart Wildlife Tracker}, - author = {Sheth, Dhruv}, - year = 2022, - month = mar, - journal = {Hackster.io}, - url = {https://www.hackster.io/dhruvsheth\_/eletect-tinyml-and-iot-based-smart-wildlife-tracker-c03e5a}, -} + author = {Sheth, Dhruv}, + journal = {Hackster.io}, + month = {Mar}, + title = {Eletect - TinyML and IOT based Smart Wildlife Tracker}, + url = {https://www.hackster.io/dhruvsheth_/eletect-tinyml-and-iot-based-smart-wildlife-tracker-c03e5a}, + year = 2022, + Bdsk-Url-1 = {https://www.hackster.io/dhruvsheth_/eletect-tinyml-and-iot-based-smart-wildlife-tracker-c03e5a}} @inproceedings{shi2022data, - title = {Data selection for efficient model update in federated learning}, - author = {Shi, Hongrui and Radu, Valentin}, - year = 2022, - booktitle = {Proceedings of the 2nd European Workshop on Machine Learning and Systems}, - pages = {72--78}, -} + author = {Shi, Hongrui and Radu, Valentin}, + booktitle = {Proceedings of the 2nd European Workshop on Machine Learning and Systems}, + pages = {72--78}, + title = {Data selection for efficient model update in federated learning}, + year = 2022} @article{smestad2023systematic, - title = {A Systematic Literature Review on Client Selection in Federated Learning}, - author = {Smestad, Carl and Li, Jingyue}, - year = 2023, - journal = {arXiv preprint arXiv:2306.04862}, -} + author = {Smestad, Carl and Li, Jingyue}, + journal = {arXiv preprint arXiv:2306.04862}, + title = {A Systematic Literature Review on Client Selection in Federated Learning}, + year = 2023} @misc{smoothquant, - title = {SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models}, - author = {Xiao and Lin, Seznec and Wu, Demouth and Han}, - year = 2023, - url = {https://arxiv.org/abs/2211.10438}, - urldate = {2023-06-05}, -} - -@inproceedings{suda2016throughput, - title = {Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks}, - author = {Suda, Naveen and Chandra, Vikas and Dasika, Ganesh and Mohanty, Abinash and Ma, Yufei and Vrudhula, Sarma and Seo, Jae-sun and Cao, Yu}, - year = 2016, - booktitle = {Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays}, - pages = {16--25}, -} + abstract = {Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the matrix multiplications in LLMs, including OPT, BLOOM, GLM, MT-NLG, and LLaMA family. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. SmoothQuant enables serving 530B LLM within a single node. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs.}, + author = {Xiao and Lin, Seznec and Wu, Demouth and Han}, + doi = {10.48550/arXiv.2211.10438}, + title = {SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models}, + url = {https://arxiv.org/abs/2211.10438}, + urldate = {2023-06-05}, + year = 2023, + Bdsk-Url-1 = {https://arxiv.org/abs/2211.10438}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2211.10438}} @misc{surveyofquant, - title = {A Survey of Quantization Methods for Efficient Neural Network Inference)}, - author = {Gholami and Kim, Dong and Yao, Mahoney and Keutzer}, - year = 2021, - url = {https://arxiv.org/abs/2103.13630}, - urldate = {2021-06-21}, -} - -@article{Sze2017-ak, - title = {Efficient processing of deep neural networks: A tutorial and survey}, - author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel}, - year = 2017, - month = mar, - copyright = {http://arxiv.org/licenses/nonexclusive-distrib/1.0/}, - archiveprefix = {arXiv}, - primaryclass = {cs.CV}, - eprint = {1703.09039}, -} - -@article{sze2017efficient, - title = {Efficient processing of deep neural networks: A tutorial and survey}, - author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel S}, - year = 2017, - journal = {Proceedings of the IEEE}, - publisher = {Ieee}, - volume = 105, - number = 12, - pages = {2295--2329}, -} + abstract = {As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.}, + author = {Gholami and Kim, Dong and Yao, Mahoney and Keutzer}, + doi = {10.48550/arXiv.2103.13630}, + title = {A Survey of Quantization Methods for Efficient Neural Network Inference)}, + url = {https://arxiv.org/abs/2103.13630}, + urldate = {2021-06-21}, + year = 2021, + Bdsk-Url-1 = {https://arxiv.org/abs/2103.13630}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.2103.13630}} @misc{tan_efficientnet_2020, - title = {{EfficientNet}: {Rethinking} {Model} {Scaling} for {Convolutional} {Neural} {Networks}}, - author = {Tan, Mingxing and Le, Quoc V.}, - year = 2020, - month = sep, - publisher = {arXiv}, - url = {http://arxiv.org/abs/1905.11946}, - urldate = {2023-10-20}, - note = {arXiv:1905.11946 [cs, stat]}, -} + abstract = {Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3\% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7\%), Flowers (98.8\%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.}, + author = {Tan, Mingxing and Le, Quoc V.}, + doi = {10.48550/arXiv.1905.11946}, + file = {arXiv Fulltext PDF:/Users/jeffreyma/Zotero/storage/KISBF35I/Tan and Le - 2020 - EfficientNet Rethinking Model Scaling for Convolu.pdf:application/pdf;arXiv.org Snapshot:/Users/jeffreyma/Zotero/storage/TUD4PH4M/1905.html:text/html}, + keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning}, + month = sep, + note = {arXiv:1905.11946 [cs, stat]}, + publisher = {arXiv}, + shorttitle = {{EfficientNet}}, + title = {{EfficientNet}: {Rethinking} {Model} {Scaling} for {Convolutional} {Neural} {Networks}}, + url = {http://arxiv.org/abs/1905.11946}, + urldate = {2023-10-20}, + year = 2020, + Bdsk-Url-1 = {http://arxiv.org/abs/1905.11946}, + Bdsk-Url-2 = {https://doi.org/10.48550/arXiv.1905.11946}} @inproceedings{tan2019mnasnet, - title = {Mnasnet: Platform-aware neural architecture search for mobile}, - author = {Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V}, - year = 2019, - booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {2820--2828}, -} + author = {Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V}, + booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages = {2820--2828}, + title = {Mnasnet: Platform-aware neural architecture search for mobile}, + year = 2019} @misc{tan2020efficientnet, - title = {EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks}, - author = {Mingxing Tan and Quoc V. Le}, - year = 2020, - eprint = {1905.11946}, archiveprefix = {arXiv}, + author = {Mingxing Tan and Quoc V. Le}, + eprint = {1905.11946}, primaryclass = {cs.LG}, -} - -@article{tang2022soft, - title = {Soft bioelectronics for cardiac interfaces}, - author = {Tang, Xin and He, Yichun and Liu, Jia}, - year = 2022, - journal = {Biophysics Reviews}, - publisher = {AIP Publishing}, - volume = 3, - number = 1, -} - -@article{tang2023flexible, - title = {Flexible brain--computer interfaces}, - author = {Tang, Xin and Shen, Hao and Zhao, Siyuan and Li, Na and Liu, Jia}, - year = 2023, - journal = {Nature Electronics}, - publisher = {Nature Publishing Group UK London}, - volume = 6, - number = 2, - pages = {109--118}, -} + title = {EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks}, + year = 2020} @misc{Team_2023, - title = {Data-centric AI for the Enterprise}, - author = {Team, Snorkel}, - year = 2023, - month = aug, - journal = {Snorkel AI}, - url = {https://snorkel.ai/}, -} + author = {Team, Snorkel}, + journal = {Snorkel AI}, + month = {Aug}, + title = {Data-centric AI for the Enterprise}, + url = {https://snorkel.ai/}, + year = 2023, + Bdsk-Url-1 = {https://snorkel.ai/}} @misc{Thefutur92:online, - title = {The future is being built on Arm: Market diversification continues to drive strong royalty and licensing growth as ecosystem reaches quarter of a trillion chips milestone – Arm\textregistered{}}, - author = {ARM.com}, - note = {(Accessed on 09/16/2023)}, + author = {ARM.com}, howpublished = {\url{https://www.arm.com/company/news/2023/02/arm-announces-q3-fy22-results}}, -} + note = {(Accessed on 09/16/2023)}, + title = {The future is being built on Arm: Market diversification continues to drive strong royalty and licensing growth as ecosystem reaches quarter of a trillion chips milestone -- Arm{\textregistered}}} @misc{threefloat, - title = {Three Floating Point Formats}, - author = {Google}, - year = 2023, - url = {https://storage.googleapis.com/gweb-cloudblog-publish/images/Three\_floating-point\_formats.max-624x261.png}, - urldate = {2023-10-20}, -} + author = {Google}, + title = {Three Floating Point Formats}, + url = {https://storage.googleapis.com/gweb-cloudblog-publish/images/Three_floating-point_formats.max-624x261.png}, + urldate = {2023-10-20}, + year = 2023, + Bdsk-Url-1 = {https://storage.googleapis.com/gweb-cloudblog-publish/images/Three_floating-point_formats.max-624x261.png}} @article{tirtalistyani2022indonesia, - title = {Indonesia rice irrigation system: Time for innovation}, - author = {Tirtalistyani, Rose and Murtiningrum, Murtiningrum and Kanwar, Rameshwar S}, - year = 2022, - journal = {Sustainability}, - publisher = {MDPI}, - volume = 14, - number = 19, - pages = 12477, -} + author = {Tirtalistyani, Rose and Murtiningrum, Murtiningrum and Kanwar, Rameshwar S}, + journal = {Sustainability}, + number = 19, + pages = 12477, + publisher = {MDPI}, + title = {Indonesia rice irrigation system: Time for innovation}, + volume = 14, + year = 2022} @inproceedings{tokui2015chainer, - title = {Chainer: a next-generation open source framework for deep learning}, - author = {Tokui, Seiya and Oono, Kenta and Hido, Shohei and Clayton, Justin}, - year = 2015, - booktitle = {Proceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS)}, - volume = 5, - pages = {1--6}, -} + author = {Tokui, Seiya and Oono, Kenta and Hido, Shohei and Clayton, Justin}, + booktitle = {Proceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS)}, + pages = {1--6}, + title = {Chainer: a next-generation open source framework for deep learning}, + volume = 5, + year = 2015} @article{van_de_ven_three_2022, - title = {Three types of incremental learning}, - author = {Van De Ven, Gido M. and Tuytelaars, Tinne and Tolias, Andreas S.}, - year = 2022, - month = dec, - journal = {Nature Machine Intelligence}, - volume = 4, - number = 12, - pages = {1185--1197}, - url = {https://www.nature.com/articles/s42256-022-00568-3}, - urldate = {2023-10-26}, - language = {en}, -} + author = {Van De Ven, Gido M. and Tuytelaars, Tinne and Tolias, Andreas S.}, + doi = {10.1038/s42256-022-00568-3}, + file = {Van De Ven et al. - 2022 - Three types of incremental learning.pdf:/Users/alex/Zotero/storage/5ZAHXMQN/Van De Ven et al. - 2022 - Three types of incremental learning.pdf:application/pdf}, + issn = {2522-5839}, + journal = {Nature Machine Intelligence}, + language = {en}, + month = dec, + number = 12, + pages = {1185--1197}, + title = {Three types of incremental learning}, + url = {https://www.nature.com/articles/s42256-022-00568-3}, + urldate = {2023-10-26}, + volume = 4, + year = 2022, + Bdsk-Url-1 = {https://www.nature.com/articles/s42256-022-00568-3}, + Bdsk-Url-2 = {https://doi.org/10.1038/s42256-022-00568-3}} @article{vaswani2017attention, - title = {Attention is all you need}, - author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, - year = 2017, - journal = {Advances in neural information processing systems}, - volume = 30, -} + author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, + journal = {Advances in neural information processing systems}, + title = {Attention is all you need}, + volume = 30, + year = 2017} @misc{Vectorbo78:online, - title = {Vector-borne diseases}, - note = {(Accessed on 10/17/2023)}, howpublished = {\url{https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases}}, -} + note = {(Accessed on 10/17/2023)}, + title = {Vector-borne diseases}} @misc{Verma_2022, - title = {Elephant AI}, - author = {Verma, Team Dual\_Boot: Swapnil}, - year = 2022, - month = mar, - journal = {Hackster.io}, - url = {https://www.hackster.io/dual\_boot/elephant-ai-ba71e9}, -} - -@article{verma2019memory, - title = {In-memory computing: Advances and prospects}, - author = {Verma, Naveen and Jia, Hongyang and Valavi, Hossein and Tang, Yinqi and Ozatay, Murat and Chen, Lung-Yen and Zhang, Bonan and Deaville, Peter}, - year = 2019, - journal = {IEEE Solid-State Circuits Magazine}, - publisher = {IEEE}, - volume = 11, - number = 3, - pages = {43--55}, -} + author = {Verma, Team Dual_Boot: Swapnil}, + journal = {Hackster.io}, + month = {Mar}, + title = {Elephant AI}, + url = {https://www.hackster.io/dual_boot/elephant-ai-ba71e9}, + year = 2022, + Bdsk-Url-1 = {https://www.hackster.io/dual_boot/elephant-ai-ba71e9}} @misc{villalobos_machine_2022, - title = {Machine {Learning} {Model} {Sizes} and the {Parameter} {Gap}}, - author = {Villalobos, Pablo and Sevilla, Jaime and Besiroglu, Tamay and Heim, Lennart and Ho, Anson and Hobbhahn, Marius}, - year = 2022, - month = jul, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2207.02852}, - urldate = {2023-10-25}, - note = {arXiv:2207.02852 [cs]}, - language = {en}, -} + author = {Villalobos, Pablo and Sevilla, Jaime and Besiroglu, Tamay and Heim, Lennart and Ho, Anson and Hobbhahn, Marius}, + file = {Villalobos et al. - 2022 - Machine Learning Model Sizes and the Parameter Gap.pdf:/Users/alex/Zotero/storage/WW69A82B/Villalobos et al. - 2022 - Machine Learning Model Sizes and the Parameter Gap.pdf:application/pdf}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Computation and Language}, + language = {en}, + month = jul, + note = {arXiv:2207.02852 [cs]}, + publisher = {arXiv}, + title = {Machine {Learning} {Model} {Sizes} and the {Parameter} {Gap}}, + url = {http://arxiv.org/abs/2207.02852}, + urldate = {2023-10-25}, + year = 2022, + Bdsk-Url-1 = {http://arxiv.org/abs/2207.02852}} @misc{villalobos_trends_2022, - title = {Trends in {Training} {Dataset} {Sizes}}, - author = {Villalobos, Pablo and Ho, Anson}, - year = 2022, - month = sep, - journal = {Epoch AI}, - url = {https://epochai.org/blog/trends-in-training-dataset-sizes}, -} + author = {Villalobos, Pablo and Ho, Anson}, + journal = {Epoch AI}, + month = sep, + title = {Trends in {Training} {Dataset} {Sizes}}, + url = {https://epochai.org/blog/trends-in-training-dataset-sizes}, + year = 2022, + Bdsk-Url-1 = {https://epochai.org/blog/trends-in-training-dataset-sizes}} @misc{VinBrain, - journal = {VinBrain}, - url = {https://vinbrain.net/aiscaler}, -} + journal = {VinBrain}, + url = {https://vinbrain.net/aiscaler}, + Bdsk-Url-1 = {https://vinbrain.net/aiscaler}} @article{vinuesa2020role, - title = {The role of artificial intelligence in achieving the Sustainable Development Goals}, - author = {Vinuesa, Ricardo and Azizpour, Hossein and Leite, Iolanda and Balaam, Madeline and Dignum, Virginia and Domisch, Sami and Fell{\"a}nder, Anna and Langhans, Simone Daniela and Tegmark, Max and Fuso Nerini, Francesco}, - year = 2020, - journal = {Nature communications}, - publisher = {Nature Publishing Group}, - volume = 11, - number = 1, - pages = {1--10}, -} - -@article{Vivet2021, - title = {IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management}, - author = {Vivet, Pascal and Guthmuller, Eric and Thonnart, Yvain and Pillonnet, Gael and Fuguet, C\'{e}sar and Miro-Panades, Ivan and Moritz, Guillaume and Durupt, Jean and Bernard, Christian and Varreau, Didier and Pontes, Julian and Thuries, S\'{e}bastien and Coriat, David and Harrand, Michel and Dutoit, Denis and Lattard, Didier and Arnaud, Lucile and Charbonnier, Jean and Coudrain, Perceval and Garnier, Arnaud and Berger, Fr\'{e}d\'{e}ric and Gueugnot, Alain and Greiner, Alain and Meunier, Quentin L. and Farcy, Alexis and Arriordaz, Alexandre and Ch\'{e}ramy, S\'{e}verine and Clermidy, Fabien}, - year = 2021, - journal = {IEEE Journal of Solid-State Circuits}, - volume = 56, - number = 1, - pages = {79--97}, -} - -@inproceedings{wang2020apq, - title = {APQ: Joint Search for Network Architecture, Pruning and Quantization Policy}, - author = {Wang, Tianzhe and Wang, Kuan and Cai, Han and Lin, Ji and Liu, Zhijian and Wang, Hanrui and Lin, Yujun and Han, Song}, - year = 2020, - booktitle = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, - pages = {2075--2084}, -} + author = {Vinuesa, Ricardo and Azizpour, Hossein and Leite, Iolanda and Balaam, Madeline and Dignum, Virginia and Domisch, Sami and Fell{\"a}nder, Anna and Langhans, Simone Daniela and Tegmark, Max and Fuso Nerini, Francesco}, + journal = {Nature communications}, + number = 1, + pages = {1--10}, + publisher = {Nature Publishing Group}, + title = {The role of artificial intelligence in achieving the Sustainable Development Goals}, + volume = 11, + year = 2020} @article{warden2018speech, - title = {Speech commands: A dataset for limited-vocabulary speech recognition}, - author = {Warden, Pete}, - year = 2018, - journal = {arXiv preprint arXiv:1804.03209}, -} + author = {Warden, Pete}, + journal = {arXiv preprint arXiv:1804.03209}, + title = {Speech commands: A dataset for limited-vocabulary speech recognition}, + year = 2018} @book{warden2019tinyml, - title = {Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers}, - author = {Warden, Pete and Situnayake, Daniel}, - year = 2019, - publisher = {O'Reilly Media}, -} - -@article{wearableinsulin, - title = {Wearable Insulin Biosensors for Diabetes Management: Advances and Challenges}, - author = {Psoma, Sotiria D. and Kanthou, Chryso}, - year = 2023, - journal = {Biosensors}, - volume = 13, - number = 7, - url = {https://www.mdpi.com/2079-6374/13/7/719}, - article-number = 719, - pubmedid = 37504117, -} - -@book{weik_survey_1955, - title = {A {Survey} of {Domestic} {Electronic} {Digital} {Computing} {Systems}}, - author = {Weik, Martin H.}, - year = 1955, - publisher = {Ballistic Research Laboratories}, - language = {en}, -} + author = {Warden, Pete and Situnayake, Daniel}, + publisher = {O'Reilly Media}, + title = {Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers}, + year = 2019} @article{weiss_survey_2016, - title = {A survey of transfer learning}, - author = {Weiss, Karl and Khoshgoftaar, Taghi M. and Wang, DingDing}, - year = 2016, - month = dec, - journal = {Journal of Big Data}, - volume = 3, - number = 1, - pages = 9, - url = {http://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0043-6}, - urldate = {2023-10-25}, - language = {en}, -} - -@article{wong2012metal, - title = {Metal--oxide RRAM}, - author = {Wong, H-S Philip and Lee, Heng-Yuan and Yu, Shimeng and Chen, Yu-Sheng and Wu, Yi and Chen, Pang-Shiu and Lee, Byoungil and Chen, Frederick T and Tsai, Ming-Jinn}, - year = 2012, - journal = {Proceedings of the IEEE}, - publisher = {IEEE}, - volume = 100, - number = 6, - pages = {1951--1970}, -} + author = {Weiss, Karl and Khoshgoftaar, Taghi M. and Wang, DingDing}, + doi = {10.1186/s40537-016-0043-6}, + file = {Weiss et al. - 2016 - A survey of transfer learning.pdf:/Users/alex/Zotero/storage/3FN2Y6EA/Weiss et al. - 2016 - A survey of transfer learning.pdf:application/pdf}, + issn = {2196-1115}, + journal = {Journal of Big Data}, + language = {en}, + month = dec, + number = 1, + pages = 9, + title = {A survey of transfer learning}, + url = {http://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0043-6}, + urldate = {2023-10-25}, + volume = 3, + year = 2016, + Bdsk-Url-1 = {http://journalofbigdata.springeropen.com/articles/10.1186/s40537-016-0043-6}, + Bdsk-Url-2 = {https://doi.org/10.1186/s40537-016-0043-6}} @inproceedings{wu2019fbnet, - title = {Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search}, - author = {Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt}, - year = 2019, - booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {10734--10742}, -} + author = {Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt}, + booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages = {10734--10742}, + title = {Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search}, + year = 2019} @article{wu2022sustainable, - title = {Sustainable ai: Environmental implications, challenges and opportunities}, - author = {Wu, Carole-Jean and Raghavendra, Ramya and Gupta, Udit and Acun, Bilge and Ardalani, Newsha and Maeng, Kiwan and Chang, Gloria and Aga, Fiona and Huang, Jinshi and Bai, Charles and others}, - year = 2022, - journal = {Proceedings of Machine Learning and Systems}, - volume = 4, - pages = {795--813}, -} + author = {Wu, Carole-Jean and Raghavendra, Ramya and Gupta, Udit and Acun, Bilge and Ardalani, Newsha and Maeng, Kiwan and Chang, Gloria and Aga, Fiona and Huang, Jinshi and Bai, Charles and others}, + journal = {Proceedings of Machine Learning and Systems}, + pages = {795--813}, + title = {Sustainable ai: Environmental implications, challenges and opportunities}, + volume = 4, + year = 2022} @inproceedings{xie2020adversarial, - title = {Adversarial examples improve image recognition}, - author = {Xie, Cihang and Tan, Mingxing and Gong, Boqing and Wang, Jiang and Yuille, Alan L and Le, Quoc V}, - year = 2020, - booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, - pages = {819--828}, -} - -@article{xiong_mri-based_2021, - title = {{MRI}-based brain tumor segmentation using {FPGA}-accelerated neural network}, - author = {Xiong, Siyu and Wu, Guoqing and Fan, Xitian and Feng, Xuan and Huang, Zhongcheng and Cao, Wei and Zhou, Xuegong and Ding, Shijin and Yu, Jinhua and Wang, Lingli and Shi, Zhifeng}, - year = 2021, - month = sep, - journal = {BMC Bioinformatics}, - volume = 22, - number = 1, - pages = 421, - url = {https://doi.org/10.1186/s12859-021-04347-6}, - urldate = {2023-11-07}, -} - -@article{xiu2019time, - title = {Time Moore: Exploiting Moore's Law from the perspective of time}, - author = {Xiu, Liming}, - year = 2019, - journal = {IEEE Solid-State Circuits Magazine}, - publisher = {IEEE}, - volume = 11, - number = 1, - pages = {39--55}, -} + author = {Xie, Cihang and Tan, Mingxing and Gong, Boqing and Wang, Jiang and Yuille, Alan L and Le, Quoc V}, + booktitle = {Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, + pages = {819--828}, + title = {Adversarial examples improve image recognition}, + year = 2020} @article{xu2018alternating, - title = {Alternating multi-bit quantization for recurrent neural networks}, - author = {Xu, Chen and Yao, Jianqiang and Lin, Zhouchen and Ou, Wenwu and Cao, Yuanbin and Wang, Zhirong and Zha, Hongbin}, - year = 2018, - journal = {arXiv preprint arXiv:1802.00150}, -} + author = {Xu, Chen and Yao, Jianqiang and Lin, Zhouchen and Ou, Wenwu and Cao, Yuanbin and Wang, Zhirong and Zha, Hongbin}, + journal = {arXiv preprint arXiv:1802.00150}, + title = {Alternating multi-bit quantization for recurrent neural networks}, + year = 2018} @article{xu2023demystifying, - title = {Demystifying CLIP Data}, - author = {Xu, Hu and Xie, Saining and Tan, Xiaoqing Ellen and Huang, Po-Yao and Howes, Russell and Sharma, Vasu and Li, Shang-Wen and Ghosh, Gargi and Zettlemoyer, Luke and Feichtenhofer, Christoph}, - year = 2023, - journal = {arXiv preprint arXiv:2309.16671}, -} + author = {Xu, Hu and Xie, Saining and Tan, Xiaoqing Ellen and Huang, Po-Yao and Howes, Russell and Sharma, Vasu and Li, Shang-Wen and Ghosh, Gargi and Zettlemoyer, Luke and Feichtenhofer, Christoph}, + journal = {arXiv preprint arXiv:2309.16671}, + title = {Demystifying CLIP Data}, + year = 2023} @article{xu2023federated, - title = {Federated Learning of Gboard Language Models with Differential Privacy}, - author = {Xu, Zheng and Zhang, Yanxiang and Andrew, Galen and Choquette-Choo, Christopher A and Kairouz, Peter and McMahan, H Brendan and Rosenstock, Jesse and Zhang, Yuanbo}, - year = 2023, - journal = {arXiv preprint arXiv:2305.18465}, -} + author = {Xu, Zheng and Zhang, Yanxiang and Andrew, Galen and Choquette-Choo, Christopher A and Kairouz, Peter and McMahan, H Brendan and Rosenstock, Jesse and Zhang, Yuanbo}, + journal = {arXiv preprint arXiv:2305.18465}, + title = {Federated Learning of Gboard Language Models with Differential Privacy}, + year = 2023} @article{yamashita2023coffee, - title = {Coffee disease classification at the edge using deep learning}, - author = {Yamashita, Jo{\~a}o Vitor Yukio Bordin and Leite, Jo{\~a}o Paulo RR}, - year = 2023, - journal = {Smart Agricultural Technology}, - publisher = {Elsevier}, - volume = 4, - pages = 100183, -} + author = {Yamashita, Jo{\~a}o Vitor Yukio Bordin and Leite, Jo{\~a}o Paulo RR}, + journal = {Smart Agricultural Technology}, + pages = 100183, + publisher = {Elsevier}, + title = {Coffee disease classification at the edge using deep learning}, + volume = 4, + year = 2023} @misc{yang2020coexploration, - title = {Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks}, - author = {Lei Yang and Zheyu Yan and Meng Li and Hyoukjun Kwon and Liangzhen Lai and Tushar Krishna and Vikas Chandra and Weiwen Jiang and Yiyu Shi}, - year = 2020, - eprint = {2002.04116}, archiveprefix = {arXiv}, + author = {Lei Yang and Zheyu Yan and Meng Li and Hyoukjun Kwon and Liangzhen Lai and Tushar Krishna and Vikas Chandra and Weiwen Jiang and Yiyu Shi}, + eprint = {2002.04116}, primaryclass = {cs.LG}, -} + title = {Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks}, + year = 2020} @inproceedings{yang2023online, - title = {Online Model Compression for Federated Learning with Large Models}, - author = {Yang, Tien-Ju and Xiao, Yonghui and Motta, Giovanni and Beaufays, Fran{\c{c}}oise and Mathews, Rajiv and Chen, Mingqing}, - year = 2023, - booktitle = {ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, - pages = {1--5}, + author = {Yang, Tien-Ju and Xiao, Yonghui and Motta, Giovanni and Beaufays, Fran{\c{c}}oise and Mathews, Rajiv and Chen, Mingqing}, + booktitle = {ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, organization = {IEEE}, -} - -@misc{yik2023neurobench, - title = {NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking}, - author = {Jason Yik and Soikat Hasan Ahmed and Zergham Ahmed and Brian Anderson and Andreas G. Andreou and Chiara Bartolozzi and Arindam Basu and Douwe den Blanken and Petrut Bogdan and Sander Bohte and Younes Bouhadjar and Sonia Buckley and Gert Cauwenberghs and Federico Corradi and Guido de Croon and Andreea Danielescu and Anurag Daram and Mike Davies and Yigit Demirag and Jason Eshraghian and Jeremy Forest and Steve Furber and Michael Furlong and Aditya Gilra and Giacomo Indiveri and Siddharth Joshi and Vedant Karia and Lyes Khacef and James C. Knight and Laura Kriener and Rajkumar Kubendran and Dhireesha Kudithipudi and Gregor Lenz and Rajit Manohar and Christian Mayr and Konstantinos Michmizos and Dylan Muir and Emre Neftci and Thomas Nowotny and Fabrizio Ottati and Ayca Ozcelikkale and Noah Pacik-Nelson and Priyadarshini Panda and Sun Pao-Sheng and Melika Payvand and Christian Pehle and Mihai A. Petrovici and Christoph Posch and Alpha Renner and Yulia Sandamirskaya and Clemens JS Schaefer and Andr\'{e} van Schaik and Johannes Schemmel and Catherine Schuman and Jae-sun Seo and Sadique Sheik and Sumit Bam Shrestha and Manolis Sifalakis and Amos Sironi and Kenneth Stewart and Terrence C. Stewart and Philipp Stratmann and Guangzhi Tang and Jonathan Timcheck and Marian Verhelst and Craig M. Vineyard and Bernhard Vogginger and Amirreza Yousefzadeh and Biyan Zhou and Fatima Tuz Zohora and Charlotte Frenkel and Vijay Janapa Reddi}, - year = 2023, - eprint = {2304.04640}, - archiveprefix = {arXiv}, - primaryclass = {cs.AI}, -} - -@article{young2018recent, - title = {Recent trends in deep learning based natural language processing}, - author = {Young, Tom and Hazarika, Devamanyu and Poria, Soujanya and Cambria, Erik}, - year = 2018, - journal = {ieee Computational intelligenCe magazine}, - publisher = {IEEE}, - volume = 13, - number = 3, - pages = {55--75}, -} + pages = {1--5}, + title = {Online Model Compression for Federated Learning with Large Models}, + year = 2023} @inproceedings{zennaro2022tinyml, - title = {TinyML: applied AI for development}, - author = {Zennaro, Marco and Plancher, Brian and Reddi, V Janapa}, - year = 2022, - booktitle = {The UN 7th Multi-stakeholder Forum on Science, Technology and Innovation for the Sustainable Development Goals}, - pages = {2022--05}, -} + author = {Zennaro, Marco and Plancher, Brian and Reddi, V Janapa}, + booktitle = {The UN 7th Multi-stakeholder Forum on Science, Technology and Innovation for the Sustainable Development Goals}, + pages = {2022--05}, + title = {TinyML: applied AI for development}, + year = 2022} @article{zennarobridging, - title = {Bridging the Digital Divide: the Promising Impact of TinyML for Developing Countries}, - author = {Zennaro, Marco and Plancher, Brian and Reddi, Vijay Janapa}, -} + author = {Zennaro, Marco and Plancher, Brian and Reddi, Vijay Janapa}, + title = {Bridging the Digital Divide: the Promising Impact of TinyML for Developing Countries}} @inproceedings{Zhang_2020_CVPR_Workshops, - title = {Fast Hardware-Aware Neural Architecture Search}, - author = {Zhang, Li Lyna and Yang, Yuqing and Jiang, Yuhang and Zhu, Wenwu and Liu, Yunxin}, - year = 2020, - month = jun, - booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, -} - -@inproceedings{zhang2015fpga, - title = {FPGA-based Accelerator Design for Deep Convolutional Neural Networks Proceedings of the 2015 ACM}, - author = {Zhang, Chen and Li, Peng and Sun, Guangyu and Guan, Yijin and Xiao, Bingjun and Cong, Jason Optimizing}, - year = 2015, - booktitle = {SIGDA International Symposium on Field-Programmable Gate Arrays-FPGA}, - volume = 15, - pages = {161--170}, -} - -@article{Zhang2017, - title = {Highly wearable cuff-less blood pressure and heart rate monitoring with single-arm electrocardiogram and photoplethysmogram signals}, - author = {Zhang, Qingxue and Zhou, Dian and Zeng, Xuan}, - year = 2017, - month = feb, - day = {06}, - journal = {BioMedical Engineering OnLine}, - volume = 16, - number = 1, - pages = 23, - url = {https://doi.org/10.1186/s12938-017-0317-z}, -} + author = {Zhang, Li Lyna and Yang, Yuqing and Jiang, Yuhang and Zhu, Wenwu and Liu, Yunxin}, + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, + month = {June}, + title = {Fast Hardware-Aware Neural Architecture Search}, + year = 2020} @misc{zhang2019autoshrink, - title = {AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture}, - author = {Tunhou Zhang and Hsin-Pai Cheng and Zhenwen Li and Feng Yan and Chengyu Huang and Hai Li and Yiran Chen}, - year = 2019, - eprint = {1911.09251}, archiveprefix = {arXiv}, + author = {Tunhou Zhang and Hsin-Pai Cheng and Zhenwen Li and Feng Yan and Chengyu Huang and Hai Li and Yiran Chen}, + eprint = {1911.09251}, primaryclass = {cs.LG}, -} + title = {AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture}, + year = 2019} @article{zhao2018federated, - title = {Federated learning with non-iid data}, - author = {Zhao, Yue and Li, Meng and Lai, Liangzhen and Suda, Naveen and Civin, Damon and Chandra, Vikas}, - year = 2018, - journal = {arXiv preprint arXiv:1806.00582}, -} + author = {Zhao, Yue and Li, Meng and Lai, Liangzhen and Suda, Naveen and Civin, Damon and Chandra, Vikas}, + journal = {arXiv preprint arXiv:1806.00582}, + title = {Federated learning with non-iid data}, + year = 2018} @misc{zhou_deep_2023, - title = {Deep {Class}-{Incremental} {Learning}: {A} {Survey}}, - author = {Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei}, - year = 2023, - month = feb, - publisher = {arXiv}, - url = {http://arxiv.org/abs/2302.03648}, - urldate = {2023-10-26}, - note = {arXiv:2302.03648 [cs]}, - language = {en}, -} + annote = {Comment: Code is available at https://github.com/zhoudw-zdw/CIL\_Survey/}, + author = {Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei}, + file = {Zhou et al. - 2023 - Deep Class-Incremental Learning A Survey.pdf:/Users/alex/Zotero/storage/859VZG7W/Zhou et al. - 2023 - Deep Class-Incremental Learning A Survey.pdf:application/pdf}, + keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning}, + language = {en}, + month = feb, + note = {arXiv:2302.03648 [cs]}, + publisher = {arXiv}, + shorttitle = {Deep {Class}-{Incremental} {Learning}}, + title = {Deep {Class}-{Incremental} {Learning}: {A} {Survey}}, + url = {http://arxiv.org/abs/2302.03648}, + urldate = {2023-10-26}, + year = 2023, + Bdsk-Url-1 = {http://arxiv.org/abs/2302.03648}} -@misc{zhou2021analognets, - title = {AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator}, - author = {Chuteng Zhou and Fernando Garcia Redondo and Julian B\"{u}chel and Irem Boybat and Xavier Timoneda Comas and S. R. Nandakumar and Shidhartha Das and Abu Sebastian and Manuel Le Gallo and Paul N. Whatmough}, - year = 2021, - eprint = {2111.06503}, - archiveprefix = {arXiv}, - primaryclass = {cs.AR}, -} +@misc{noauthor_who_nodate, + title = {Who {Invented} the {Microprocessor}? - {CHM}}, + url = {https://computerhistory.org/blog/who-invented-the-microprocessor/}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://computerhistory.org/blog/who-invented-the-microprocessor/}} -@article{zhou2022photonic, - title = {Photonic matrix multiplication lights up photonic accelerator and beyond}, - author = {Zhou, Hailong and Dong, Jianji and Cheng, Junwei and Dong, Wenchan and Huang, Chaoran and Shen, Yichen and Zhang, Qiming and Gu, Min and Qian, Chao and Chen, Hongsheng and others}, - year = 2022, - journal = {Light: Science \& Applications}, - publisher = {Nature Publishing Group UK London}, - volume = 11, - number = 1, - pages = 30, -} +@book{weik_survey_1955, + author = {Weik, Martin H.}, + language = {en}, + publisher = {Ballistic Research Laboratories}, + title = {A {Survey} of {Domestic} {Electronic} {Digital} {Computing} {Systems}}, + year = {1955}} + +@inproceedings{brown_language_2020, + abstract = {We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.}, + author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario}, + booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, + pages = {1877--1901}, + publisher = {Curran Associates, Inc.}, + title = {Language {Models} are {Few}-{Shot} {Learners}}, + url = {https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}, + urldate = {2023-11-07}, + volume = {33}, + year = {2020}, + Bdsk-Url-1 = {https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}} + +@misc{jia_dissecting_2018, + abstract = {Every year, novel NVIDIA GPU designs are introduced. This rapid architectural and technological progression, coupled with a reluctance by manufacturers to disclose low-level details, makes it difficult for even the most proficient GPU software designers to remain up-to-date with the technological advances at a microarchitectural level. To address this dearth of public, microarchitectural-level information on the novel NVIDIA GPUs, independent researchers have resorted to microbenchmarks-based dissection and discovery. This has led to a prolific line of publications that shed light on instruction encoding, and memory hierarchy's geometry and features at each level. Namely, research that describes the performance and behavior of the Kepler, Maxwell and Pascal architectures. In this technical report, we continue this line of research by presenting the microarchitectural details of the NVIDIA Volta architecture, discovered through microbenchmarks and instruction set disassembly. Additionally, we compare quantitatively our Volta findings against its predecessors, Kepler, Maxwell and Pascal.}, + author = {Jia, Zhe and Maggioni, Marco and Staiger, Benjamin and Scarpazza, Daniele P.}, + keywords = {Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance}, + month = apr, + note = {arXiv:1804.06826 [cs]}, + publisher = {arXiv}, + title = {Dissecting the {NVIDIA} {Volta} {GPU} {Architecture} via {Microbenchmarking}}, + url = {http://arxiv.org/abs/1804.06826}, + urldate = {2023-11-07}, + year = {2018}, + Bdsk-Url-1 = {http://arxiv.org/abs/1804.06826}} + +@article{jia2019beyond, + author = {Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, + journal = {Proceedings of Machine Learning and Systems}, + pages = {1--13}, + title = {Beyond Data and Model Parallelism for Deep Neural Networks.}, + volume = {1}, + year = {2019}} + +@inproceedings{raina_large-scale_2009, + address = {Montreal Quebec Canada}, + author = {Raina, Rajat and Madhavan, Anand and Ng, Andrew Y.}, + booktitle = {Proceedings of the 26th {Annual} {International} {Conference} on {Machine} {Learning}}, + doi = {10.1145/1553374.1553486}, + isbn = {978-1-60558-516-1}, + language = {en}, + month = jun, + pages = {873--880}, + publisher = {ACM}, + title = {Large-scale deep unsupervised learning using graphics processors}, + url = {https://dl.acm.org/doi/10.1145/1553374.1553486}, + urldate = {2023-11-07}, + year = {2009}, + Bdsk-Url-1 = {https://dl.acm.org/doi/10.1145/1553374.1553486}, + Bdsk-Url-2 = {https://doi.org/10.1145/1553374.1553486}} + +@misc{noauthor_amd_nodate, + title = {{AMD} {Radeon} {RX} 7000 {Series} {Desktop} {Graphics} {Cards}}, + url = {https://www.amd.com/en/graphics/radeon-rx-graphics}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.amd.com/en/graphics/radeon-rx-graphics}} + +@misc{noauthor_intel_nodate, + abstract = {Find out how Intel{\textregistered} Arc Graphics unlock lifelike gaming and seamless content creation.}, + journal = {Intel}, + language = {en}, + title = {Intel{\textregistered} {Arc}{\texttrademark} {Graphics} {Overview}}, + url = {https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html}} + +@article{lindholm_nvidia_2008, + abstract = {To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable in C or via graphics APIs.}, + author = {Lindholm, Erik and Nickolls, John and Oberman, Stuart and Montrym, John}, + doi = {10.1109/MM.2008.31}, + issn = {1937-4143}, + journal = {IEEE Micro}, + month = mar, + note = {Conference Name: IEEE Micro}, + number = {2}, + pages = {39--55}, + shorttitle = {{NVIDIA} {Tesla}}, + title = {{NVIDIA} {Tesla}: {A} {Unified} {Graphics} and {Computing} {Architecture}}, + url = {https://ieeexplore.ieee.org/document/4523358}, + urldate = {2023-11-07}, + volume = {28}, + year = {2008}, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/document/4523358}, + Bdsk-Url-2 = {https://doi.org/10.1109/MM.2008.31}} + +@article{dally_evolution_2021, + abstract = {Graphics processing units (GPUs) power today's fastest supercomputers, are the dominant platform for deep learning, and provide the intelligence for devices ranging from self-driving cars to robots and smart cameras. They also generate compelling photorealistic images at real-time frame rates. GPUs have evolved by adding features to support new use cases. NVIDIA's GeForce 256, the first GPU, was a dedicated processor for real-time graphics, an application that demands large amounts of floating-point arithmetic for vertex and fragment shading computations and high memory bandwidth. As real-time graphics advanced, GPUs became programmable. The combination of programmability and floating-point performance made GPUs attractive for running scientific applications. Scientists found ways to use early programmable GPUs by casting their calculations as vertex and fragment shaders. GPUs evolved to meet the needs of scientific users by adding hardware for simpler programming, double-precision floating-point arithmetic, and resilience.}, + author = {Dally, William J. and Keckler, Stephen W. and Kirk, David B.}, + doi = {10.1109/MM.2021.3113475}, + issn = {1937-4143}, + journal = {IEEE Micro}, + month = nov, + note = {Conference Name: IEEE Micro}, + number = {6}, + pages = {42--51}, + title = {Evolution of the {Graphics} {Processing} {Unit} ({GPU})}, + url = {https://ieeexplore.ieee.org/document/9623445}, + urldate = {2023-11-07}, + volume = {41}, + year = {2021}, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/document/9623445}, + Bdsk-Url-2 = {https://doi.org/10.1109/MM.2021.3113475}} + +@article{demler_ceva_2020, + author = {Demler, Mike}, + language = {en}, + title = {{CEVA} {SENSPRO} {FUSES} {AI} {AND} {VECTOR} {DSP}}, + year = {2020}} + +@misc{noauthor_google_2023, + abstract = {Tensor G3 on Pixel 8 and Pixel 8 Pro is more helpful, more efficient and more powerful.}, + journal = {Google}, + language = {en-us}, + month = oct, + shorttitle = {Google {Tensor} {G3}}, + title = {Google {Tensor} {G3}: {The} new chip that gives your {Pixel} an {AI} upgrade}, + url = {https://blog.google/products/pixel/google-tensor-g3-pixel-8/}, + urldate = {2023-11-07}, + year = {2023}, + Bdsk-Url-1 = {https://blog.google/products/pixel/google-tensor-g3-pixel-8/}} + +@misc{noauthor_hexagon_nodate, + abstract = {The Hexagon DSP processor has both CPU and DSP functionality to support deeply embedded processing needs of the mobile platform for both multimedia and modem functions.}, + journal = {Qualcomm Developer Network}, + language = {en}, + title = {Hexagon {DSP} {SDK} {Processor}}, + url = {https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://developer.qualcomm.com/software/hexagon-dsp-sdk/dsp-processor}} + +@misc{noauthor_evolution_2023, + abstract = {To complement the extensive perspective of another Market Update feature article on DSP Products and Applications, published in the November 2020 edition, audioXpress was honored to have the valuable contribution from one of the main suppliers in the field. In this article, Youval Nachum, CEVA's Senior Product Marketing Manager, writes about \"The Evolution of Audio DSPs,\" discussing how DSP technology has evolved, its impact on the user experience, and what the future of DSP has in store for us.}, + journal = {audioXpress}, + language = {en}, + month = oct, + title = {The {Evolution} of {Audio} {DSPs}}, + url = {https://audioxpress.com/article/the-evolution-of-audio-dsps}, + urldate = {2023-11-07}, + year = {2023}, + Bdsk-Url-1 = {https://audioxpress.com/article/the-evolution-of-audio-dsps}} + +@article{xiong_mri-based_2021, + abstract = {Brain tumor segmentation is a challenging problem in medical image processing and analysis. It is a very time-consuming and error-prone task. In order to reduce the burden on physicians and improve the segmentation accuracy, the computer-aided detection (CAD) systems need to be developed. Due to the powerful feature learning ability of the deep learning technology, many deep learning-based methods have been applied to the brain tumor segmentation CAD systems and achieved satisfactory accuracy. However, deep learning neural networks have high computational complexity, and the brain tumor segmentation process consumes significant time. Therefore, in order to achieve the high segmentation accuracy of brain tumors and obtain the segmentation results efficiently, it is very demanding to speed up the segmentation process of brain tumors.}, + author = {Xiong, Siyu and Wu, Guoqing and Fan, Xitian and Feng, Xuan and Huang, Zhongcheng and Cao, Wei and Zhou, Xuegong and Ding, Shijin and Yu, Jinhua and Wang, Lingli and Shi, Zhifeng}, + doi = {10.1186/s12859-021-04347-6}, + issn = {1471-2105}, + journal = {BMC Bioinformatics}, + keywords = {Brain tumor segmatation, FPGA acceleration, Neural network}, + month = sep, + number = {1}, + pages = {421}, + title = {{MRI}-based brain tumor segmentation using {FPGA}-accelerated neural network}, + url = {https://doi.org/10.1186/s12859-021-04347-6}, + urldate = {2023-11-07}, + volume = {22}, + year = {2021}, + Bdsk-Url-1 = {https://doi.org/10.1186/s12859-021-04347-6}} + +@article{gwennap_certus-nx_nodate, + author = {Gwennap, Linley}, + language = {en}, + title = {Certus-{NX} {Innovates} {General}-{Purpose} {FPGAs}}} + +@misc{noauthor_fpga_nodate, + title = {{FPGA} {Architecture} {Overview}}, + url = {https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/fpga-architecture-overview.html}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/optimization-guide/2023-1/fpga-architecture-overview.html}} + +@misc{noauthor_what_nodate, + abstract = {What is an FPGA - Field Programmable Gate Arrays are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be reprogrammed to desired application or functionality requirements after manufacturing.}, + journal = {AMD}, + language = {en}, + shorttitle = {What is an {FPGA}?}, + title = {What is an {FPGA}? {Field} {Programmable} {Gate} {Array}}, + url = {https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.xilinx.com/products/silicon-devices/fpga/what-is-an-fpga.html}} + +@article{putnam_reconfigurable_2014, + abstract = {Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables + In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95\% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29\%}, + author = {Putnam, Andrew and Caulfield, Adrian M. and Chung, Eric S. and Chiou, Derek and Constantinides, Kypros and Demme, John and Esmaeilzadeh, Hadi and Fowers, Jeremy and Gopal, Gopi Prashanth and Gray, Jan and Haselman, Michael and Hauck, Scott and Heil, Stephen and Hormati, Amir and Kim, Joo-Young and Lanka, Sitaram and Larus, James and Peterson, Eric and Pope, Simon and Smith, Aaron and Thong, Jason and Xiao, Phillip Yi and Burger, Doug}, + doi = {10.1145/2678373.2665678}, + issn = {0163-5964}, + journal = {ACM SIGARCH Computer Architecture News}, + language = {en}, + month = oct, + number = {3}, + pages = {13--24}, + title = {A reconfigurable fabric for accelerating large-scale datacenter services}, + url = {https://dl.acm.org/doi/10.1145/2678373.2665678}, + urldate = {2023-11-07}, + volume = {42}, + year = {2014}, + Bdsk-Url-1 = {https://dl.acm.org/doi/10.1145/2678373.2665678}, + Bdsk-Url-2 = {https://doi.org/10.1145/2678373.2665678}} + +@misc{noauthor_project_nodate, + title = {Project {Catapult} - {Microsoft} {Research}}, + url = {https://www.microsoft.com/en-us/research/project/project-catapult/}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.microsoft.com/en-us/research/project/project-catapult/}} + +@misc{dean_jeff_numbers_nodate, + author = {Dean. Jeff}, + title = {Numbers {Everyone} {Should} {Know}}, + url = {https://brenocon.com/dean_perf.html}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://brenocon.com/dean_perf.html}} + +@misc{bailey_enabling_2018, + abstract = {Enabling Cheaper Design, At what point does cheaper design enable a significant growth in custom semiconductor content? Not everyone is onboard with the idea.}, + author = {Bailey, Brian}, + journal = {Semiconductor Engineering}, + language = {en-US}, + month = sep, + title = {Enabling {Cheaper} {Design}}, + url = {https://semiengineering.com/enabling-cheaper-design/}, + urldate = {2023-11-07}, + year = {2018}, + Bdsk-Url-1 = {https://semiengineering.com/enabling-cheaper-design/}} + +@misc{noauthor_integrated_2023, + abstract = {An integrated circuit (also known as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece of semiconductor material, usually silicon. Large numbers of miniaturized transistors and other electronic components are integrated together on the chip. This results in circuits that are orders of magnitude smaller, faster, and less expensive than those constructed of discrete components, allowing a large transistor count. +The IC's mass production capability, reliability, and building-block approach to integrated circuit design have ensured the rapid adoption of standardized ICs in place of designs using discrete transistors. ICs are now used in virtually all electronic equipment and have revolutionized the world of electronics. Computers, mobile phones and other home appliances are now essential parts of the structure of modern societies, made possible by the small size and low cost of ICs such as modern computer processors and microcontrollers. +Very-large-scale integration was made practical by technological advancements in semiconductor device fabrication. Since their origins in the 1960s, the size, speed, and capacity of chips have progressed enormously, driven by technical advances that fit more and more transistors on chips of the same size -- a modern chip may have many billions of transistors in an area the size of a human fingernail. These advances, roughly following Moore's law, make the computer chips of today possess millions of times the capacity and thousands of times the speed of the computer chips of the early 1970s. +ICs have three main advantages over discrete circuits: size, cost and performance. The size and cost is low because the chips, with all their components, are printed as a unit by photolithography rather than being constructed one transistor at a time. Furthermore, packaged ICs use much less material than discrete circuits. Performance is high because the IC's components switch quickly and consume comparatively little power because of their small size and proximity. The main disadvantage of ICs is the high initial cost of designing them and the enormous capital cost of factory construction. This high initial cost means ICs are only commercially viable when high production volumes are anticipated.}, + copyright = {Creative Commons Attribution-ShareAlike License}, + journal = {Wikipedia}, + language = {en}, + month = nov, + note = {Page Version ID: 1183537457}, + title = {Integrated circuit}, + url = {https://en.wikipedia.org/w/index.php?title=Integrated_circuit&oldid=1183537457}, + urldate = {2023-11-07}, + year = {2023}, + Bdsk-Url-1 = {https://en.wikipedia.org/w/index.php?title=Integrated_circuit&oldid=1183537457}} + +@article{el-rayis_reconfigurable_nodate, + author = {El-Rayis, Ahmed Osman}, + language = {en}, + title = {Reconfigurable {Architectures} for the {Next} {Generation} of {Mobile} {Device} {Telecommunications} {Systems}}} + +@misc{noauthor_intel_nodate, + abstract = {View Intel{\textregistered} Stratix{\textregistered} 10 NX FPGAs and find product specifications, features, applications and more.}, + journal = {Intel}, + language = {en}, + title = {Intel{\textregistered} {Stratix}{\textregistered} 10 {NX} {FPGA} {Overview} - {High} {Performance} {Stratix}{\textregistered} {FPGA}}, + url = {https://www.intel.com/content/www/us/en/products/details/fpga/stratix/10/nx.html}, + urldate = {2023-11-07}, + Bdsk-Url-1 = {https://www.intel.com/content/www/us/en/products/details/fpga/stratix/10/nx.html}} + +@book{patterson2016computer, + author = {Patterson, David A and Hennessy, John L}, + publisher = {Morgan kaufmann}, + title = {Computer organization and design ARM edition: the hardware software interface}, + year = {2016}} + +@article{xiu2019time, + author = {Xiu, Liming}, + journal = {IEEE Solid-State Circuits Magazine}, + number = {1}, + pages = {39--55}, + publisher = {IEEE}, + title = {Time Moore: Exploiting Moore's Law from the perspective of time}, + volume = {11}, + year = {2019}} + +@article{brown2020language, + author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others}, + journal = {Advances in neural information processing systems}, + pages = {1877--1901}, + title = {Language models are few-shot learners}, + volume = {33}, + year = {2020}} + +@article{cheng2017survey, + author = {Cheng, Yu and Wang, Duo and Zhou, Pan and Zhang, Tao}, + journal = {arXiv preprint arXiv:1710.09282}, + title = {A survey of model compression and acceleration for deep neural networks}, + year = {2017}} + +@article{sze2017efficient, + author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel S}, + journal = {Proceedings of the IEEE}, + number = {12}, + pages = {2295--2329}, + publisher = {Ieee}, + title = {Efficient processing of deep neural networks: A tutorial and survey}, + volume = {105}, + year = {2017}} + +@article{young2018recent, + author = {Young, Tom and Hazarika, Devamanyu and Poria, Soujanya and Cambria, Erik}, + journal = {ieee Computational intelligenCe magazine}, + number = {3}, + pages = {55--75}, + publisher = {IEEE}, + title = {Recent trends in deep learning based natural language processing}, + volume = {13}, + year = {2018}} + +@inproceedings{jacob2018quantization, + author = {Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry}, + booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages = {2704--2713}, + title = {Quantization and training of neural networks for efficient integer-arithmetic-only inference}, + year = {2018}} + +@article{gale2019state, + author = {Gale, Trevor and Elsen, Erich and Hooker, Sara}, + journal = {arXiv preprint arXiv:1902.09574}, + title = {The state of sparsity in deep neural networks}, + year = {2019}} + +@inproceedings{zhang2015fpga, + author = {Zhang, Chen and Li, Peng and Sun, Guangyu and Guan, Yijin and Xiao, Bingjun and Cong, Jason Optimizing}, + booktitle = {SIGDA International Symposium on Field-Programmable Gate Arrays-FPGA}, + pages = {161--170}, + title = {FPGA-based Accelerator Design for Deep Convolutional Neural Networks Proceedings of the 2015 ACM}, + volume = {15}, + year = {2015}} + +@inproceedings{suda2016throughput, + author = {Suda, Naveen and Chandra, Vikas and Dasika, Ganesh and Mohanty, Abinash and Ma, Yufei and Vrudhula, Sarma and Seo, Jae-sun and Cao, Yu}, + booktitle = {Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays}, + pages = {16--25}, + title = {Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks}, + year = {2016}} + +@inproceedings{fowers2018configurable, + author = {Fowers, Jeremy and Ovtcharov, Kalin and Papamichael, Michael and Massengill, Todd and Liu, Ming and Lo, Daniel and Alkalay, Shlomi and Haselman, Michael and Adams, Logan and Ghandi, Mahdi and others}, + booktitle = {2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)}, + organization = {IEEE}, + pages = {1--14}, + title = {A configurable cloud-scale DNN processor for real-time AI}, + year = {2018}} + +@article{jia2019beyond, + author = {Jia, Zhihao and Zaharia, Matei and Aiken, Alex}, + journal = {Proceedings of Machine Learning and Systems}, + pages = {1--13}, + title = {Beyond Data and Model Parallelism for Deep Neural Networks.}, + volume = {1}, + year = {2019}} @inproceedings{zhu2018benchmarking, - title = {Benchmarking and analyzing deep neural network training}, - author = {Zhu, Hongyu and Akrout, Mohamed and Zheng, Bojian and Pelegris, Andrew and Jayarajan, Anand and Phanishayee, Amar and Schroeder, Bianca and Pekhimenko, Gennady}, - year = 2018, - booktitle = {2018 IEEE International Symposium on Workload Characterization (IISWC)}, - pages = {88--100}, + author = {Zhu, Hongyu and Akrout, Mohamed and Zheng, Bojian and Pelegris, Andrew and Jayarajan, Anand and Phanishayee, Amar and Schroeder, Bianca and Pekhimenko, Gennady}, + booktitle = {2018 IEEE International Symposium on Workload Characterization (IISWC)}, organization = {IEEE}, -} + pages = {88--100}, + title = {Benchmarking and analyzing deep neural network training}, + year = {2018}} -@article{zhuang_comprehensive_2021, - title = {A {Comprehensive} {Survey} on {Transfer} {Learning}}, - author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, - year = 2021, - month = jan, - journal = {Proceedings of the IEEE}, - volume = 109, - number = 1, - pages = {43--76}, - url = {https://ieeexplore.ieee.org/document/9134370/}, - urldate = {2023-10-25}, - language = {en}, -} +@article{samajdar2018scale, + author = {Samajdar, Ananda and Zhu, Yuhao and Whatmough, Paul and Mattina, Matthew and Krishna, Tushar}, + journal = {arXiv preprint arXiv:1811.02883}, + title = {Scale-sim: Systolic cnn accelerator simulator}, + year = {2018}} + +@inproceedings{munshi2009opencl, + author = {Munshi, Aaftab}, + booktitle = {2009 IEEE Hot Chips 21 Symposium (HCS)}, + doi = {10.1109/HOTCHIPS.2009.7478342}, + pages = {1-314}, + title = {The OpenCL specification}, + year = {2009}, + Bdsk-Url-1 = {https://doi.org/10.1109/HOTCHIPS.2009.7478342}} + +@inproceedings{luebke2008cuda, + author = {Luebke, David}, + booktitle = {2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro}, + doi = {10.1109/ISBI.2008.4541126}, + pages = {836-838}, + title = {CUDA: Scalable parallel programming for high-performance scientific computing}, + year = {2008}, + Bdsk-Url-1 = {https://doi.org/10.1109/ISBI.2008.4541126}} + +@misc{segal1999opengl, + author = {Segal, Mark and Akeley, Kurt}, + title = {The OpenGL graphics system: A specification (version 1.1)}, + year = {1999}} + +@inproceedings{gannot1994verilog, + author = {Gannot, G. and Ligthart, M.}, + booktitle = {International Verilog HDL Conference}, + doi = {10.1109/IVC.1994.323743}, + pages = {86-92}, + title = {Verilog HDL based FPGA design}, + year = {1994}, + Bdsk-Url-1 = {https://doi.org/10.1109/IVC.1994.323743}} + +@article{binkert2011gem5, + author = {Binkert, Nathan and Beckmann, Bradford and Black, Gabriel and Reinhardt, Steven K and Saidi, Ali and Basu, Arkaprava and Hestness, Joel and Hower, Derek R and Krishna, Tushar and Sardashti, Somayeh and others}, + journal = {ACM SIGARCH computer architecture news}, + number = {2}, + pages = {1--7}, + publisher = {ACM New York, NY, USA}, + title = {The gem5 simulator}, + volume = {39}, + year = {2011}} + +@article{Vivet2021, + author = {Vivet, Pascal and Guthmuller, Eric and Thonnart, Yvain and Pillonnet, Gael and Fuguet, C{\'e}sar and Miro-Panades, Ivan and Moritz, Guillaume and Durupt, Jean and Bernard, Christian and Varreau, Didier and Pontes, Julian and Thuries, S{\'e}bastien and Coriat, David and Harrand, Michel and Dutoit, Denis and Lattard, Didier and Arnaud, Lucile and Charbonnier, Jean and Coudrain, Perceval and Garnier, Arnaud and Berger, Fr{\'e}d{\'e}ric and Gueugnot, Alain and Greiner, Alain and Meunier, Quentin L. and Farcy, Alexis and Arriordaz, Alexandre and Ch{\'e}ramy, S{\'e}verine and Clermidy, Fabien}, + doi = {10.1109/JSSC.2020.3036341}, + journal = {IEEE Journal of Solid-State Circuits}, + number = {1}, + pages = {79-97}, + title = {IntAct: A 96-Core Processor With Six Chiplets 3D-Stacked on an Active Interposer With Distributed Interconnects and Integrated Power Management}, + volume = {56}, + year = {2021}, + Bdsk-Url-1 = {https://doi.org/10.1109/JSSC.2020.3036341}} + +@article{schuman2022, + author = {Schuman, Catherine D and Kulkarni, Shruti R and Parsa, Maryam and Mitchell, J Parker and Date, Prasanna and Kay, Bill}, + journal = {Nature Computational Science}, + number = {1}, + pages = {10--19}, + publisher = {Nature Publishing Group US New York}, + title = {Opportunities for neuromorphic computing algorithms and applications}, + volume = {2}, + year = {2022}} + +@article{markovic2020, + author = {Markovi{\'c}, Danijela and Mizrahi, Alice and Querlioz, Damien and Grollier, Julie}, + journal = {Nature Reviews Physics}, + number = {9}, + pages = {499--510}, + publisher = {Nature Publishing Group UK London}, + title = {Physics for neuromorphic computing}, + volume = {2}, + year = {2020}} + +@article{furber2016large, + author = {Furber, Steve}, + journal = {Journal of neural engineering}, + number = {5}, + pages = {051001}, + publisher = {IOP Publishing}, + title = {Large-scale neuromorphic computing systems}, + volume = {13}, + year = {2016}} + +@article{davies2018loihi, + author = {Davies, Mike and Srinivasa, Narayan and Lin, Tsung-Han and Chinya, Gautham and Cao, Yongqiang and Choday, Sri Harsha and Dimou, Georgios and Joshi, Prasad and Imam, Nabil and Jain, Shweta and others}, + journal = {Ieee Micro}, + number = {1}, + pages = {82--99}, + publisher = {IEEE}, + title = {Loihi: A neuromorphic manycore processor with on-chip learning}, + volume = {38}, + year = {2018}} + +@article{davies2021advancing, + author = {Davies, Mike and Wild, Andreas and Orchard, Garrick and Sandamirskaya, Yulia and Guerra, Gabriel A Fonseca and Joshi, Prasad and Plank, Philipp and Risbud, Sumedh R}, + journal = {Proceedings of the IEEE}, + number = {5}, + pages = {911--934}, + publisher = {IEEE}, + title = {Advancing neuromorphic computing with loihi: A survey of results and outlook}, + volume = {109}, + year = {2021}} + +@article{modha2023neural, + author = {Modha, Dharmendra S and Akopyan, Filipp and Andreopoulos, Alexander and Appuswamy, Rathinakumar and Arthur, John V and Cassidy, Andrew S and Datta, Pallab and DeBole, Michael V and Esser, Steven K and Otero, Carlos Ortega and others}, + journal = {Science}, + number = {6668}, + pages = {329--335}, + publisher = {American Association for the Advancement of Science}, + title = {Neural inference at the frontier of energy, space, and time}, + volume = {382}, + year = {2023}} + +@article{maass1997networks, + author = {Maass, Wolfgang}, + journal = {Neural networks}, + number = {9}, + pages = {1659--1671}, + publisher = {Elsevier}, + title = {Networks of spiking neurons: the third generation of neural network models}, + volume = {10}, + year = {1997}} + +@article{10242251, + author = {Eshraghian, Jason K. and Ward, Max and Neftci, Emre O. and Wang, Xinxin and Lenz, Gregor and Dwivedi, Girish and Bennamoun, Mohammed and Jeong, Doo Seok and Lu, Wei D.}, + doi = {10.1109/JPROC.2023.3308088}, + journal = {Proceedings of the IEEE}, + number = {9}, + pages = {1016-1054}, + title = {Training Spiking Neural Networks Using Lessons From Deep Learning}, + volume = {111}, + year = {2023}, + Bdsk-Url-1 = {https://doi.org/10.1109/JPROC.2023.3308088}} + +@article{chua1971memristor, + author = {Chua, Leon}, + journal = {IEEE Transactions on circuit theory}, + number = {5}, + pages = {507--519}, + publisher = {IEEE}, + title = {Memristor-the missing circuit element}, + volume = {18}, + year = {1971}} + +@article{shastri2021photonics, + author = {Shastri, Bhavin J and Tait, Alexander N and Ferreira de Lima, Thomas and Pernice, Wolfram HP and Bhaskaran, Harish and Wright, C David and Prucnal, Paul R}, + journal = {Nature Photonics}, + number = {2}, + pages = {102--114}, + publisher = {Nature Publishing Group UK London}, + title = {Photonics for artificial intelligence and neuromorphic computing}, + volume = {15}, + year = {2021}} + +@article{haensch2018next, + author = {Haensch, Wilfried and Gokmen, Tayfun and Puri, Ruchir}, + journal = {Proceedings of the IEEE}, + number = {1}, + pages = {108--122}, + publisher = {IEEE}, + title = {The next generation of deep learning hardware: Analog computing}, + volume = {107}, + year = {2018}} + +@article{hazan2021neuromorphic, + author = {Hazan, Avi and Ezra Tsur, Elishai}, + journal = {Frontiers in Neuroscience}, + pages = {627221}, + publisher = {Frontiers Media SA}, + title = {Neuromorphic analog implementation of neural engineering framework-inspired spiking neuron for high-dimensional representation}, + volume = {15}, + year = {2021}} + +@article{gates2009flexible, + author = {Gates, Byron D}, + journal = {Science}, + number = {5921}, + pages = {1566--1567}, + publisher = {American Association for the Advancement of Science}, + title = {Flexible electronics}, + volume = {323}, + year = {2009}} + +@article{musk2019integrated, + author = {Musk, Elon and others}, + journal = {Journal of medical Internet research}, + number = {10}, + pages = {e16194}, + publisher = {JMIR Publications Inc., Toronto, Canada}, + title = {An integrated brain-machine interface platform with thousands of channels}, + volume = {21}, + year = {2019}} + +@article{tang2023flexible, + author = {Tang, Xin and Shen, Hao and Zhao, Siyuan and Li, Na and Liu, Jia}, + journal = {Nature Electronics}, + number = {2}, + pages = {109--118}, + publisher = {Nature Publishing Group UK London}, + title = {Flexible brain--computer interfaces}, + volume = {6}, + year = {2023}} + +@article{tang2022soft, + author = {Tang, Xin and He, Yichun and Liu, Jia}, + journal = {Biophysics Reviews}, + number = {1}, + publisher = {AIP Publishing}, + title = {Soft bioelectronics for cardiac interfaces}, + volume = {3}, + year = {2022}} + +@article{kwon2022flexible, + author = {Kwon, Sun Hwa and Dong, Lin}, + journal = {Nano Energy}, + pages = {107632}, + publisher = {Elsevier}, + title = {Flexible sensors and machine learning for heart monitoring}, + year = {2022}} + +@article{huang2010pseudo, + author = {Huang, Tsung-Ching and Fukuda, Kenjiro and Lo, Chun-Ming and Yeh, Yung-Hui and Sekitani, Tsuyoshi and Someya, Takao and Cheng, Kwang-Ting}, + journal = {IEEE Transactions on Electron Devices}, + number = {1}, + pages = {141--150}, + publisher = {IEEE}, + title = {Pseudo-CMOS: A design style for low-cost and robust flexible electronics}, + volume = {58}, + year = {2010}} + +@article{biggs2021natively, + author = {Biggs, John and Myers, James and Kufel, Jedrzej and Ozer, Emre and Craske, Simon and Sou, Antony and Ramsdale, Catherine and Williamson, Ken and Price, Richard and White, Scott}, + journal = {Nature}, + number = {7868}, + pages = {532--536}, + publisher = {Nature Publishing Group UK London}, + title = {A natively flexible 32-bit Arm microprocessor}, + volume = {595}, + year = {2021}} + +@article{farah2005neuroethics, + author = {Farah, Martha J}, + journal = {Trends in cognitive sciences}, + number = {1}, + pages = {34--40}, + publisher = {Elsevier}, + title = {Neuroethics: the practical and the philosophical}, + volume = {9}, + year = {2005}} + +@article{segura2018ethical, + author = {Segura Anaya, LH and Alsadoon, Abeer and Costadopoulos, Nectar and Prasad, PWC}, + journal = {Science and engineering ethics}, + pages = {1--28}, + publisher = {Springer}, + title = {Ethical implications of user perceptions of wearable devices}, + volume = {24}, + year = {2018}} + +@article{goodyear2017social, + author = {Goodyear, Victoria A}, + journal = {Qualitative research in sport, exercise and health}, + number = {3}, + pages = {285--302}, + publisher = {Taylor \& Francis}, + title = {Social media, apps and wearable technologies: navigating ethical dilemmas and procedures}, + volume = {9}, + year = {2017}} + +@article{roskies2002neuroethics, + author = {Roskies, Adina}, + journal = {Neuron}, + number = {1}, + pages = {21--23}, + publisher = {Elsevier}, + title = {Neuroethics for the new millenium}, + volume = {35}, + year = {2002}} + +@article{duarte2022fastml, + author = {Duarte, Javier and Tran, Nhan and Hawks, Ben and Herwig, Christian and Muhizi, Jules and Prakash, Shvetank and Reddi, Vijay Janapa}, + journal = {arXiv preprint arXiv:2207.07958}, + title = {FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning}, + year = {2022}} + +@article{verma2019memory, + author = {Verma, Naveen and Jia, Hongyang and Valavi, Hossein and Tang, Yinqi and Ozatay, Murat and Chen, Lung-Yen and Zhang, Bonan and Deaville, Peter}, + journal = {IEEE Solid-State Circuits Magazine}, + number = {3}, + pages = {43--55}, + publisher = {IEEE}, + title = {In-memory computing: Advances and prospects}, + volume = {11}, + year = {2019}} + +@article{chi2016prime, + author = {Chi, Ping and Li, Shuangchen and Xu, Cong and Zhang, Tao and Zhao, Jishen and Liu, Yongpan and Wang, Yu and Xie, Yuan}, + journal = {ACM SIGARCH Computer Architecture News}, + number = {3}, + pages = {27--39}, + publisher = {ACM New York, NY, USA}, + title = {Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory}, + volume = {44}, + year = {2016}} + +@article{burr2016recent, + author = {Burr, Geoffrey W and Brightsky, Matthew J and Sebastian, Abu and Cheng, Huai-Yu and Wu, Jau-Yi and Kim, Sangbum and Sosa, Norma E and Papandreou, Nikolaos and Lung, Hsiang-Lan and Pozidis, Haralampos and others}, + journal = {IEEE Journal on Emerging and Selected Topics in Circuits and Systems}, + number = {2}, + pages = {146--162}, + publisher = {IEEE}, + title = {Recent progress in phase-change memory technology}, + volume = {6}, + year = {2016}} + +@article{loh20083d, + author = {Loh, Gabriel H}, + journal = {ACM SIGARCH computer architecture news}, + number = {3}, + pages = {453--464}, + publisher = {ACM New York, NY, USA}, + title = {3D-stacked memory architectures for multi-core processors}, + volume = {36}, + year = {2008}} + +@article{mittal2021survey, + author = {Mittal, Sparsh and Verma, Gaurav and Kaushik, Brajesh and Khanday, Farooq A}, + journal = {Journal of Systems Architecture}, + pages = {102276}, + publisher = {Elsevier}, + title = {A survey of SRAM-based in-memory computing techniques and applications}, + volume = {119}, + year = {2021}} + +@article{wong2012metal, + author = {Wong, H-S Philip and Lee, Heng-Yuan and Yu, Shimeng and Chen, Yu-Sheng and Wu, Yi and Chen, Pang-Shiu and Lee, Byoungil and Chen, Frederick T and Tsai, Ming-Jinn}, + journal = {Proceedings of the IEEE}, + number = {6}, + pages = {1951--1970}, + publisher = {IEEE}, + title = {Metal--oxide RRAM}, + volume = {100}, + year = {2012}} + +@inproceedings{imani2016resistive, + author = {Imani, Mohsen and Rahimi, Abbas and Rosing, Tajana S}, + booktitle = {2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)}, + organization = {IEEE}, + pages = {1327--1332}, + title = {Resistive configurable associative memory for approximate computing}, + year = {2016}} + +@article{miller2000optical, + author = {Miller, David AB}, + journal = {IEEE Journal of Selected Topics in Quantum Electronics}, + number = {6}, + pages = {1312--1317}, + publisher = {IEEE}, + title = {Optical interconnects to silicon}, + volume = {6}, + year = {2000}} + +@article{zhou2022photonic, + author = {Zhou, Hailong and Dong, Jianji and Cheng, Junwei and Dong, Wenchan and Huang, Chaoran and Shen, Yichen and Zhang, Qiming and Gu, Min and Qian, Chao and Chen, Hongsheng and others}, + journal = {Light: Science \& Applications}, + number = {1}, + pages = {30}, + publisher = {Nature Publishing Group UK London}, + title = {Photonic matrix multiplication lights up photonic accelerator and beyond}, + volume = {11}, + year = {2022}} + +@article{bains2020business, + author = {Bains, Sunny}, + journal = {Nat. Electron}, + number = {7}, + pages = {348--351}, + title = {The business of building brains}, + volume = {3}, + year = {2020}} + +@article{Hennessy2019-je, + abstract = {Innovations like domain-specific hardware, enhanced security, + open instruction sets, and agile chip development will lead the + way.}, + author = {Hennessy, John L and Patterson, David A}, + copyright = {http://www.acm.org/publications/policies/copyright\_policy\#Background}, + journal = {Commun. ACM}, + language = {en}, + month = jan, + number = 2, + pages = {48--60}, + publisher = {Association for Computing Machinery (ACM)}, + title = {A new golden age for computer architecture}, + volume = 62, + year = 2019} + +@article{Dongarra2009-na, + author = {Dongarra, Jack J}, + journal = {IBM Journal of Research and Development}, + pages = {3--4}, + title = {The evolution of high performance computing on system z}, + volume = 53, + year = 2009} + +@article{Ranganathan2011-dc, + author = {Ranganathan, Parthasarathy}, + journal = {Computer (Long Beach Calif.)}, + month = jan, + number = 1, + pages = {39--48}, + publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, + title = {From microprocessors to nanostores: Rethinking data-centric systems}, + volume = 44, + year = 2011} + +@article{Ignatov2018-kh, + abstract = {Over the last years, the computational power of mobile devices + such as smartphones and tablets has grown dramatically, reaching + the level of desktop computers available not long ago. While + standard smartphone apps are no longer a problem for them, there + is still a group of tasks that can easily challenge even + high-end devices, namely running artificial intelligence + algorithms. In this paper, we present a study of the current + state of deep learning in the Android ecosystem and describe + available frameworks, programming models and the limitations of + running AI on smartphones. We give an overview of the hardware + acceleration resources available on four main mobile chipset + platforms: Qualcomm, HiSilicon, MediaTek and Samsung. + Additionally, we present the real-world performance results of + different mobile SoCs collected with AI Benchmark that are + covering all main existing hardware configurations.}, + author = {Ignatov, Andrey and Timofte, Radu and Chou, William and Wang, Ke and Wu, Max and Hartley, Tim and Van Gool, Luc}, + publisher = {arXiv}, + title = {{AI} Benchmark: Running deep neural networks on Android smartphones}, + year = 2018} + +@article{Sze2017-ak, + abstract = {Deep neural networks (DNNs) are currently widely used for + many artificial intelligence (AI) applications including + computer vision, speech recognition, and robotics. While + DNNs deliver state-of-the-art accuracy on many AI tasks, it + comes at the cost of high computational complexity. + Accordingly, techniques that enable efficient processing of + DNNs to improve energy efficiency and throughput without + sacrificing application accuracy or increasing hardware cost + are critical to the wide deployment of DNNs in AI systems. + This article aims to provide a comprehensive tutorial and + survey about the recent advances towards the goal of + enabling efficient processing of DNNs. Specifically, it will + provide an overview of DNNs, discuss various hardware + platforms and architectures that support DNNs, and highlight + key trends in reducing the computation cost of DNNs either + solely via hardware design changes or via joint hardware + design and DNN algorithm changes. It will also summarize + various development resources that enable researchers and + practitioners to quickly get started in this field, and + highlight important benchmarking metrics and design + considerations that should be used for evaluating the + rapidly growing number of DNN hardware designs, optionally + including algorithmic co-designs, being proposed in academia + and industry. The reader will take away the following + concepts from this article: understand the key design + considerations for DNNs; be able to evaluate different DNN + hardware implementations with benchmarks and comparison + metrics; understand the trade-offs between various hardware + architectures and platforms; be able to evaluate the utility + of various DNN design techniques for efficient processing; + and understand recent implementation trends and + opportunities.}, + archiveprefix = {arXiv}, + author = {Sze, Vivienne and Chen, Yu-Hsin and Yang, Tien-Ju and Emer, Joel}, + copyright = {http://arxiv.org/licenses/nonexclusive-distrib/1.0/}, + eprint = {1703.09039}, + month = mar, + primaryclass = {cs.CV}, + title = {Efficient processing of deep neural networks: A tutorial and survey}, + year = 2017} + +@inproceedings{lin2022ondevice, + author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, + booktitle = {ArXiv}, + title = {On-Device Training Under 256KB Memory}, + year = {2022}} + +@article{lin2023awq, + author = {Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song}, + journal = {arXiv}, + title = {AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration}, + year = {2023}} + +@inproceedings{wang2020apq, + author = {Wang, Tianzhe and Wang, Kuan and Cai, Han and Lin, Ji and Liu, Zhijian and Wang, Hanrui and Lin, Yujun and Han, Song}, + booktitle = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + doi = {10.1109/CVPR42600.2020.00215}, + pages = {2075-2084}, + title = {APQ: Joint Search for Network Architecture, Pruning and Quantization Policy}, + year = {2020}, + Bdsk-Url-1 = {https://doi.org/10.1109/CVPR42600.2020.00215}} + +@inproceedings{Li2020Additive, + author = {Yuhang Li and Xin Dong and Wei Wang}, + booktitle = {International Conference on Learning Representations}, + title = {Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks}, + url = {https://openreview.net/forum?id=BkgXT24tDS}, + year = {2020}, + Bdsk-Url-1 = {https://openreview.net/forum?id=BkgXT24tDS}} + +@article{janapa2023edge, + author = {Janapa Reddi, Vijay and Elium, Alexander and Hymel, Shawn and Tischler, David and Situnayake, Daniel and Ward, Carl and Moreau, Louis and Plunkett, Jenny and Kelcey, Matthew and Baaijens, Mathijs and others}, + journal = {Proceedings of Machine Learning and Systems}, + title = {Edge Impulse: An MLOps Platform for Tiny Machine Learning}, + volume = {5}, + year = {2023}} @article{zhuang2020comprehensive, - title = {A comprehensive survey on transfer learning}, - author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, - year = 2020, - journal = {Proceedings of the IEEE}, - publisher = {IEEE}, - volume = 109, - number = 1, - pages = {43--76}, -} + author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, + journal = {Proceedings of the IEEE}, + number = {1}, + pages = {43--76}, + publisher = {IEEE}, + title = {A comprehensive survey on transfer learning}, + volume = {109}, + year = {2020}} + +@article{zhuang_comprehensive_2021, + author = {Zhuang, Fuzhen and Qi, Zhiyuan and Duan, Keyu and Xi, Dongbo and Zhu, Yongchun and Zhu, Hengshu and Xiong, Hui and He, Qing}, + doi = {10.1109/JPROC.2020.3004555}, + file = {Zhuang et al. - 2021 - A Comprehensive Survey on Transfer Learning.pdf:/Users/alex/Zotero/storage/CHJB2WE4/Zhuang et al. - 2021 - A Comprehensive Survey on Transfer Learning.pdf:application/pdf}, + issn = {0018-9219, 1558-2256}, + journal = {Proceedings of the IEEE}, + language = {en}, + month = jan, + number = {1}, + pages = {43--76}, + title = {A {Comprehensive} {Survey} on {Transfer} {Learning}}, + url = {https://ieeexplore.ieee.org/document/9134370/}, + urldate = {2023-10-25}, + volume = {109}, + year = {2021}, + Bdsk-Url-1 = {https://ieeexplore.ieee.org/document/9134370/}, + Bdsk-Url-2 = {https://doi.org/10.1109/JPROC.2020.3004555}} + +@inproceedings{Norman2017TPUv1, + abstract = {Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95\% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X -- 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X -- 80X higher. Moreover, using the CPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.}, + address = {New York, NY, USA}, + author = {Jouppi, Norman P. and Young, Cliff and Patil, Nishant and Patterson, David and Agrawal, Gaurav and Bajwa, Raminder and Bates, Sarah and Bhatia, Suresh and Boden, Nan and Borchers, Al and Boyle, Rick and Cantin, Pierre-luc and Chao, Clifford and Clark, Chris and Coriell, Jeremy and Daley, Mike and Dau, Matt and Dean, Jeffrey and Gelb, Ben and Ghaemmaghami, Tara Vazir and Gottipati, Rajendra and Gulland, William and Hagmann, Robert and Ho, C. Richard and Hogberg, Doug and Hu, John and Hundt, Robert and Hurt, Dan and Ibarz, Julian and Jaffey, Aaron and Jaworski, Alek and Kaplan, Alexander and Khaitan, Harshit and Killebrew, Daniel and Koch, Andy and Kumar, Naveen and Lacy, Steve and Laudon, James and Law, James and Le, Diemthu and Leary, Chris and Liu, Zhuyuan and Lucke, Kyle and Lundin, Alan and MacKean, Gordon and Maggiore, Adriana and Mahony, Maire and Miller, Kieran and Nagarajan, Rahul and Narayanaswami, Ravi and Ni, Ray and Nix, Kathy and Norrie, Thomas and Omernick, Mark and Penukonda, Narayana and Phelps, Andy and Ross, Jonathan and Ross, Matt and Salek, Amir and Samadiani, Emad and Severn, Chris and Sizikov, Gregory and Snelham, Matthew and Souter, Jed and Steinberg, Dan and Swing, Andy and Tan, Mercedes and Thorson, Gregory and Tian, Bo and Toma, Horia and Tuttle, Erick and Vasudevan, Vijay and Walter, Richard and Wang, Walter and Wilcox, Eric and Yoon, Doe Hyun}, + booktitle = {Proceedings of the 44th Annual International Symposium on Computer Architecture}, + doi = {10.1145/3079856.3080246}, + isbn = {9781450348928}, + keywords = {accelerator, neural network, MLP, TPU, CNN, deep learning, domain-specific architecture, GPU, TensorFlow, DNN, RNN, LSTM}, + location = {Toronto, ON, Canada}, + numpages = {12}, + pages = {1-12}, + publisher = {Association for Computing Machinery}, + series = {ISCA '17}, + title = {In-Datacenter Performance Analysis of a Tensor Processing Unit}, + url = {https://doi.org/10.1145/3079856.3080246}, + year = {2017}, + Bdsk-Url-1 = {https://doi.org/10.1145/3079856.3080246}} + +@article{Norrie2021TPUv2_3, + author = {Norrie, Thomas and Patil, Nishant and Yoon, Doe Hyun and Kurian, George and Li, Sheng and Laudon, James and Young, Cliff and Jouppi, Norman and Patterson, David}, + doi = {10.1109/MM.2021.3058217}, + journal = {IEEE Micro}, + number = {2}, + pages = {56-63}, + title = {The Design Process for Google's Training Chips: TPUv2 and TPUv3}, + volume = {41}, + year = {2021}, + Bdsk-Url-1 = {https://doi.org/10.1109/MM.2021.3058217}} + +@inproceedings{Jouppi2023TPUv4, + abstract = {In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5\% of system cost and <3\% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x--7x yet use only 5\% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus nearly 10x faster overall, which along with OCS flexibility and availability allows a large language model to train at an average of ~60\% of peak FLOPS/second. For similar sized systems, it is ~4.3x--4.5x faster than the Graphcore IPU Bow and is 1.2x--1.7x faster and uses 1.3x--1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~2--6x less energy and produce ~20x less CO2e than contemporary DSAs in typical on-premise data centers.}, + address = {New York, NY, USA}, + articleno = {82}, + author = {Jouppi, Norm and Kurian, George and Li, Sheng and Ma, Peter and Nagarajan, Rahul and Nai, Lifeng and Patil, Nishant and Subramanian, Suvinay and Swing, Andy and Towles, Brian and Young, Clifford and Zhou, Xiang and Zhou, Zongwei and Patterson, David A}, + booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture}, + doi = {10.1145/3579371.3589350}, + isbn = {9798400700958}, + keywords = {warehouse scale computer, embeddings, supercomputer, domain specific architecture, reconfigurable, TPU, large language model, power usage effectiveness, CO2 equivalent emissions, energy, optical interconnect, IPU, machine learning, GPU, carbon emissions}, + location = {Orlando, FL, USA}, + numpages = {14}, + publisher = {Association for Computing Machinery}, + series = {ISCA '23}, + title = {TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings}, + url = {https://doi.org/10.1145/3579371.3589350}, + year = {2023}, + Bdsk-Url-1 = {https://doi.org/10.1145/3579371.3589350}} + +@misc{zhou2021analognets, + archiveprefix = {arXiv}, + author = {Chuteng Zhou and Fernando Garcia Redondo and Julian B{\"u}chel and Irem Boybat and Xavier Timoneda Comas and S. R. Nandakumar and Shidhartha Das and Abu Sebastian and Manuel Le Gallo and Paul N. Whatmough}, + eprint = {2111.06503}, + primaryclass = {cs.AR}, + title = {AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator}, + year = 2021} + +@article{wearableinsulin, + article-number = {719}, + author = {Psoma, Sotiria D. and Kanthou, Chryso}, + doi = {10.3390/bios13070719}, + issn = {2079-6374}, + journal = {Biosensors}, + number = {7}, + pubmedid = {37504117}, + title = {Wearable Insulin Biosensors for Diabetes Management: Advances and Challenges}, + url = {https://www.mdpi.com/2079-6374/13/7/719}, + volume = {13}, + year = {2023}, + Bdsk-Url-1 = {https://www.mdpi.com/2079-6374/13/7/719}, + Bdsk-Url-2 = {https://doi.org/10.3390/bios13070719}} + +@article{glucosemonitor, + author = {Li, Jingzhen and Tobore, Igbe and Liu, Yuhang and Kandwal, Abhishek and Wang, Lei and Nie, Zedong}, + doi = {10.1109/JBHI.2021.3072628}, + journal = {IEEE Journal of Biomedical and Health Informatics}, + number = {9}, + pages = {3340-3350}, + title = {Non-invasive Monitoring of Three Glucose Ranges Based On ECG By Using DBSCAN-CNN}, + volume = {25}, + year = {2021}, + Bdsk-Url-1 = {https://doi.org/10.1109/JBHI.2021.3072628}} + +@article{plasma, + author = {Attia, Zachi and Sugrue, Alan and Asirvatham, Samuel and Ackerman, Michael and Kapa, Suraj and Friedman, Paul and Noseworthy, Peter}, + doi = {10.1371/journal.pone.0201059}, + journal = {PLOS ONE}, + month = {08}, + pages = {e0201059}, + title = {Noninvasive assessment of dofetilide plasma concentration using a deep learning (neural network) analysis of the surface electrocardiogram: A proof of concept study}, + volume = {13}, + year = {2018}, + Bdsk-Url-1 = {https://doi.org/10.1371/journal.pone.0201059}} + +@article{afib, + author = {Yutao Guo and Hao Wang and Hui Zhang and Tong Liu and Zhaoguang Liang and Yunlong Xia and Li Yan and Yunli Xing and Haili Shi and Shuyan Li and Yanxia Liu and Fan Liu and Mei Feng and Yundai Chen and Gregory Y.H. Lip and null null}, + doi = {10.1016/j.jacc.2019.08.019}, + journal = {Journal of the American College of Cardiology}, + number = {19}, + pages = {2365-2375}, + title = {Mobile Photoplethysmographic Technology to Detect Atrial Fibrillation}, + volume = {74}, + year = {2019}, + Bdsk-Url-1 = {https://doi.org/10.1016/j.jacc.2019.08.019}} + +@article{gaitathome, + author = {Yingcheng Liu and Guo Zhang and Christopher G. Tarolli and Rumen Hristov and Stella Jensen-Roberts and Emma M. Waddell and Taylor L. Myers and Meghan E. Pawlik and Julia M. Soto and Renee M. Wilson and Yuzhe Yang and Timothy Nordahl and Karlo J. Lizarraga and Jamie L. Adams and Ruth B. Schneider and Karl Kieburtz and Terry Ellis and E. Ray Dorsey and Dina Katabi}, + doi = {10.1126/scitranslmed.adc9669}, + eprint = {https://www.science.org/doi/pdf/10.1126/scitranslmed.adc9669}, + journal = {Science Translational Medicine}, + number = {663}, + pages = {eadc9669}, + title = {Monitoring gait at home with radio waves in Parkinson's disease: A marker of severity, progression, and medication response}, + url = {https://www.science.org/doi/abs/10.1126/scitranslmed.adc9669}, + volume = {14}, + year = {2022}, + Bdsk-Url-1 = {https://www.science.org/doi/abs/10.1126/scitranslmed.adc9669}, + Bdsk-Url-2 = {https://doi.org/10.1126/scitranslmed.adc9669}} + +@article{Chen2023, + author = {Chen, Emma and Prakash, Shvetank and Janapa Reddi, Vijay and Kim, David and Rajpurkar, Pranav}, + day = {06}, + doi = {10.1038/s41551-023-01115-0}, + issn = {2157-846X}, + journal = {Nature Biomedical Engineering}, + month = {Nov}, + title = {A framework for integrating artificial intelligence for clinical care with continuous therapeutic monitoring}, + url = {https://doi.org/10.1038/s41551-023-01115-0}, + year = {2023}, + Bdsk-Url-1 = {https://doi.org/10.1038/s41551-023-01115-0}} + +@article{Zhang2017, + author = {Zhang, Qingxue and Zhou, Dian and Zeng, Xuan}, + day = {06}, + doi = {10.1186/s12938-017-0317-z}, + issn = {1475-925X}, + journal = {BioMedical Engineering OnLine}, + month = {Feb}, + number = {1}, + pages = {23}, + title = {Highly wearable cuff-less blood pressure and heart rate monitoring with single-arm electrocardiogram and photoplethysmogram signals}, + url = {https://doi.org/10.1186/s12938-017-0317-z}, + volume = {16}, + year = {2017}, + Bdsk-Url-1 = {https://doi.org/10.1186/s12938-017-0317-z}} + +@misc{yik2023neurobench, + archiveprefix = {arXiv}, + author = {Jason Yik and Soikat Hasan Ahmed and Zergham Ahmed and Brian Anderson and Andreas G. Andreou and Chiara Bartolozzi and Arindam Basu and Douwe den Blanken and Petrut Bogdan and Sander Bohte and Younes Bouhadjar and Sonia Buckley and Gert Cauwenberghs and Federico Corradi and Guido de Croon and Andreea Danielescu and Anurag Daram and Mike Davies and Yigit Demirag and Jason Eshraghian and Jeremy Forest and Steve Furber and Michael Furlong and Aditya Gilra and Giacomo Indiveri and Siddharth Joshi and Vedant Karia and Lyes Khacef and James C. Knight and Laura Kriener and Rajkumar Kubendran and Dhireesha Kudithipudi and Gregor Lenz and Rajit Manohar and Christian Mayr and Konstantinos Michmizos and Dylan Muir and Emre Neftci and Thomas Nowotny and Fabrizio Ottati and Ayca Ozcelikkale and Noah Pacik-Nelson and Priyadarshini Panda and Sun Pao-Sheng and Melika Payvand and Christian Pehle and Mihai A. Petrovici and Christoph Posch and Alpha Renner and Yulia Sandamirskaya and Clemens JS Schaefer and Andr{\'e} van Schaik and Johannes Schemmel and Catherine Schuman and Jae-sun Seo and Sadique Sheik and Sumit Bam Shrestha and Manolis Sifalakis and Amos Sironi and Kenneth Stewart and Terrence C. Stewart and Philipp Stratmann and Guangzhi Tang and Jonathan Timcheck and Marian Verhelst and Craig M. Vineyard and Bernhard Vogginger and Amirreza Yousefzadeh and Biyan Zhou and Fatima Tuz Zohora and Charlotte Frenkel and Vijay Janapa Reddi}, + eprint = {2304.04640}, + primaryclass = {cs.AI}, + title = {NeuroBench: Advancing Neuromorphic Computing through Collaborative, Fair and Representative Benchmarking}, + year = {2023}} \ No newline at end of file