{"id":329280,"date":"2023-09-11T13:00:00","date_gmt":"2023-09-11T13:00:00","guid":{"rendered":"https:\/\/pyimagesearch.com\/?p=40239"},"modified":"2023-09-11T13:00:00","modified_gmt":"2023-09-11T13:00:00","slug":"sam-from-meta-ai-part-1-segmentation-with-prompts","status":"publish","type":"post","link":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/","title":{"rendered":"SAM from Meta AI (Part 1): Segmentation with Prompts"},"content":{"rendered":"<p class=\"syndicated-attribution\"><meta name= \\\"keywords \\\" content= \\\"\u96fb\u5b50\u8a08\u7b97\u6a5f, \u6559\u80b2, IT \u96fb\u8166\u73ed,\u96fb\u8166\u88dc\u7fd2\uff0c \u96fb\u8166\u73ed\uff0c \u5bb6\u6559\uff0c \u79c1\u4eba\u8001\u5e2b\uff0c \u8cc7\u8a0a\u6280\u8853\uff0c \u7a0b\u5e8f\u8a2d\u8a08\uff0c \u96fb\u5b50\u8a08\u7b97\u6a5f\uff0c \u904a\u6232\uff0c \u860b\u679c\uff0c \u96fb\u5f71\uff0c \u8a08\u7b97\u6a5f\uff0c\u7de8\u78bc\uff0c Java\uff0c C\/C++\uff0c JavaScript\uff0c PHP\uff0c HTML\uff0c CSS\uff0c MySQL\uff0c mobile\uff0c Android\uff0c \u52d5\u6f2b\uff0c Python\uff0c teacher\uff0c \u88dc\u7fd2\uff0c \u96fb\u8166\u88dc\u7fd2 \u8cc7\u8a0a, \u7535\u5b50\u8ba1\u7b97\u673a, IT ,Game, apple, movie, Computer,student,Java,\u6559\u80b2, ,\u5b66\u751f, \u5b66\u4e60, learn, \u6559\u5b66,  Android, apple,anime, animation, \u4fe1\u606f\u6280\u672f, \u7a0b\u5e8f\u8bbe\u8ba1, \u79fb\u52a8\u7535\u8bdd, \u8cc7\u8a0a\u79d1\u6280,Game, Jeu, Juego,Call Of Duty ,\u4f7f\u547d\u53ec\u559a , \u6e38\u620f, \u7535\u5b50\u6e38\u620f,, \u591a\u4eba\u7535\u5b50\u6e38\u620f, \u7f51\u7edc\u6e38\u620f\uff0conline\uff0conline game, \u624b\u673a\u6e38\u620f, mobile \\\"><\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"TOC\"\/>\n<div class=\"yoast-breadcrumbs\"><span><span><a href=\"https:\/\/pyimagesearch.com\/\">Home<\/a><\/span><\/div>\n<h2><strong>Table of Contents<\/strong><\/h2>\n<div class=\"toc\">\n<ul>\n<li id=\"TOC-h2BPTitle\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h2BPTitle\">SAM from Meta AI (Part 1): Segmentation with Prompts<\/a><\/li>\n<ul>\n<li id=\"TOC-h3Anything\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Anything\">Segment Anything<\/a><\/li>\n<ul>\n<li id=\"TOC-h4Training\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h4Training\">Training SAM<\/a><\/li>\n<li id=\"TOC-h4Inference\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h4Inference\">Inference with SAM<\/a><\/li>\n<\/ul>\n<li id=\"TOC-h3Structure\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Structure\">Project Structure<\/a><\/li>\n<li id=\"TOC-h3Development\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Development\">Configuring Your Development Environment<\/a><\/li>\n<li id=\"TOC-h3Help\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Help\">Need Help Configuring Your Development Environment?<\/a><\/li>\n<li id=\"TOC-h3Creating\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Creating\">Creating Our Configuration File<\/a><\/li>\n<li id=\"TOC-h3Visualization\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Visualization\">Implementing Visualization Functions<\/a><\/li>\n<li id=\"TOC-h3SAM\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3SAM\">Segmentation with SAM<\/a><\/li>\n<li id=\"TOC-h3TextPrompts\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3TextPrompts\">Segmenting with SAM and Text Prompts<\/a><\/li>\n<\/ul>\n<li id=\"TOC-h2Summary\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h2Summary\">Summary<\/a><\/li>\n<ul>\n<li id=\"TOC-h3Credits\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Credits\">Credits<\/a><\/li>\n<li id=\"TOC-h3Citation\"><a rel=\"noopener\"  href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#h3Citation\">Citation Information<\/a><\/li>\n<\/ul>\n<\/ul>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2BPTitle\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h2BPTitle\"><strong>SAM from Meta AI (Part 1): Segmentation with Prompts<\/strong><\/a><\/h2>\n<p>In this tutorial, you will learn about the Segment Anything Model (SAM) from Meta AI and delve deeper into the ideas and concepts behind this newly released foundational segmentation model. Furthermore, you will learn how SAM can be used for making segmentation predictions in real-time and how you can integrate it with your own computer vision projects.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><a href=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-1024x575.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\" class=\"wp-image-41249\" width=\"700\" height=\"393\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png?size=126x71&amp;lossy=2&amp;strip=1&amp;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-300x169.png?lossy=2&amp;strip=1&amp;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png?size=378x212&amp;lossy=2&amp;strip=1&amp;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png?size=504x283&amp;lossy=2&amp;strip=1&amp;webp=1 504w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png?size=630x354&amp;lossy=2&amp;strip=1&amp;webp=1 630w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-768x431.png?lossy=2&amp;strip=1&amp;webp=1 768w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-1024x575.png?lossy=2&amp;strip=1&amp;webp=1 1024w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured.png?lossy=2&amp;strip=1&amp;webp=1 1080w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-1536x863.png?lossy=2&amp;strip=1&amp;webp=1 1536w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/09\/sam-part1-featured-2048x1151.png?lossy=2&amp;strip=1&amp;webp=1 2048w\" sizes=\"(max-width: 630px) 100vw, 630px\" \/><\/a><\/figure>\n<\/div>\n<p>This lesson is the 1st of a 2-part series on <strong>Segment Anything Model (SAM) from Meta AI<\/strong>:<\/p>\n<ol>\n<li><a href=\"https:\/\/pyimg.co\/0ivy4\"  rel=\"noreferrer noopener\"><strong><em>SAM from Meta AI (Part 1): Segmentation with Prompts<\/em><\/strong><\/a><strong><em> <\/em>(this tutorial)<\/strong><\/li>\n<li><em>SAM from Meta AI (Part 2): Integration with CLIP for Downstream Tasks<\/em><\/li>\n<\/ol>\n<p>In the first part of this tutorial series, we will develop a holistic understanding of SAM and discuss in detail how SAM can be prompted in different ways, which allows it to segment specific regions in any image in real-time.<\/p>\n<p>In the second part of this tutorial, we will take a step ahead and understand how SAM can be integrated with other foundational models like contrastive language-image pre-training (CLIP) to perform varied downstream tasks like zero-shot classification, text-to-image retrieval, and image similarity.<\/p>\n<p><strong>To learn how to use SAM in your own projects, <\/strong><strong><em>just keep reading.<\/em><\/strong><\/p>\n<div id=\"pyi-source-code-block\" class=\"source-code-wrap\">\n<div class=\"gpd-source-code\">\n<div class=\"gpd-source-code-content\">\n        <img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/source-code-icon.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\"><\/p>\n<h4>Looking for the source code to this post?<\/h4>\n<p>                    <a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#download-the-code\" class=\"pyis-cta-modal-open-modal\">Jump Right To The Downloads Section <svg class=\"svg-icon arrow-right\" width=\"12\" height=\"12\" aria-hidden=\"true\" role=\"img\" focusable=\"false\" viewBox=\"0 0 14 14\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"><path d=\"M6.8125 0.1875C6.875 0.125 6.96875 0.09375 7.09375 0.09375C7.1875 0.09375 7.28125 0.125 7.34375 0.1875L13.875 6.75C13.9375 6.8125 14 6.90625 14 7C14 7.125 13.9375 7.1875 13.875 7.25L7.34375 13.8125C7.28125 13.875 7.1875 13.9062 7.09375 13.9062C6.96875 13.9062 6.875 13.875 6.8125 13.8125L6.1875 13.1875C6.125 13.125 6.09375 13.0625 6.09375 12.9375C6.09375 12.8438 6.125 12.75 6.1875 12.6562L11.0312 7.8125H0.375C0.25 7.8125 0.15625 7.78125 0.09375 7.71875C0.03125 7.65625 0 7.5625 0 7.4375V6.5625C0 6.46875 0.03125 6.375 0.09375 6.3125C0.15625 6.25 0.25 6.1875 0.375 6.1875H11.0312L6.1875 1.34375C6.125 1.28125 6.09375 1.1875 6.09375 1.0625C6.09375 0.96875 6.125 0.875 6.1875 0.8125L6.8125 0.1875Z\" fill=\"#169FE6\"><\/path><\/svg><\/a>\n            <\/div>\n<\/div>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2BPTitle\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h2BPTitle\"><strong>SAM from Meta AI (Part 1): Segmentation with Prompts<\/strong><\/a><\/h2>\n<p>In the past, computer vision models have mostly relied on methods trained on a specific task with task-specific annotated datasets. They can perform that particular task on novel examples at test time. This affects the practical usability of these models since it is not always possible or feasible to have access to or collect large amounts of data for every task at hand.<\/p>\n<p>Furthermore, the generalization performance of these models considerably deteriorates if the distribution of examples at test time deviates from the data distribution of the training examples. This limited their applicability in practical, real-world applications where we need systems that can perform multiple downstream tasks effectively and are robust to distribution shifts.<\/p>\n<p>To address the aforementioned issues, there has been considerable effort in the computer vision community to develop models with more general capabilities that can holistically understand data and perform various downstream tasks of varied data distributions.<\/p>\n<p>Recent progress toward developing such general-purpose &#8220;foundational models&#8221; has boomed the machine learning and computer vision community. This direction of research was first explored by the natural language processing (NLP) community (which eventually led to the development of ChatGPT) and has gradually been picked up by the computer vision folks to develop holistic general-purpose models which can perform varied tasks alone or can be integrated into pre-established systems to improve performance.<\/p>\n<p>In computer vision, the recent foundational models mainly rely on utilizing large scale web-data and aligning image and text pairs to train strong representation models. One such example is the CLIP model, which learns representations that understand the semantics of various objects and can generalize them in a zero-shot way (i.e., without further fine-tuning). The CLIP model shows exceptional results on zero-shot image classification tasks and can outperform various fine-tuned supervised models. <\/p>\n<p>Along similar lines, Meta AI recently released the segment anything model (SAM), the first attempt to design a foundational model for image segmentation. SAM can perform segmentation on data from various distributions and can adapt to solve several downstream tasks at test time. Furthermore, SAM can be seamlessly integrated with pre-established computer vision models and systems to boost their capabilities and performance on complex tasks for which training task-specific models would not be feasible.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Anything\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Anything\"><strong>Segment Anything<\/strong><\/a><\/h3>\n<p>The SAM marks the first step toward developing general-purpose foundational models for image segmentation tasks. Like other foundational models, it is pre-trained on a large-scale dataset with 11 million images annotated with 1 billion masks. This allows it to generalize to diverse image distributions and objects at test time (<a href=\"https:\/\/segment-anything.com\/dataset\/index.html\"  rel=\"noreferrer noopener\">https:\/\/segment-anything.com\/dataset\/index.html<\/a>).<\/p>\n<p>SAM is designed and developed to be promptable, allowing it to seamlessly tackle different tasks at inference, including those it was not trained to perform. Furthermore, this will enable it to seamlessly integrate with other computer vision systems for different downstream tasks apart from image segmentation. Let us discuss this in further detail to understand the key ideas behind this approach.<\/p>\n<p>Prompt engineering refers to crafting text inputs to get desired responses from foundational models. For example, engineered text prompts are used to query ChatGPT and get a useful or desirable response for the user. CLIP is prompted with hand-engineered text prompts to enhance its zero-shot classification performance on object categories, etc.<\/p>\n<p>Using the aforementioned foundational models, SAM designs a <em>promptable segmentation <\/em>task with a prompt in the form of a point, bounding box, or text. The model tries to predict a segmentation mask for the region indicated by the input prompt. Once the model is trained, SAM can be prompted with various engineered prompts per the downstream task to enable a wide range of downstream applications similar to other foundational models like ChatGPT and CLIP.<\/p>\n<p>Let us delve deeper and get an overview of the training and inference details, which allow SAM to generalize to new data distributions and tasks at inference.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Training\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h4Training\"><strong>Training SAM<\/strong><\/a><\/h4>\n<p>SAM is pre-trained with a prompt-based segmentation pre-training objective. Specifically, it involves a sequence of prompts (e.g., points, boxes, masks) input to the model with an image sample. The model outputs a segmentation mask prediction based on the prompt, which is then compared with the ground truth segmentation mask to compute the loss.<\/p>\n<p><strong>Figure 1<\/strong> shows an overview of the SAM training pipeline. First, an image is input to the transformer-based image encoder, which outputs feature representation as image embeddings, as shown in the figure. SAM uses a masked autoencoder-based pre-trained vision transformer as the image encoder.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh4.googleusercontent.com\/D3AeqOs5BFDmQbtowOoZqyAd7y2iezCT-7RToOM7wAZBtzWyUqbasnOnRgHwiao4fjNkT_dj4QdUDfD48keFEpk73Icd5PkjyLOYDGW0oErWWtjUfOkUAHI9u-PyU1HhhM01Ptlpof4nWicVf89WiaI\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/D3AeqOs5BFDmQbtowOoZqyAd7y2iezCT-7RToOM7wAZBtzWyUqbasnOnRgHwiao4fjNkT_dj4QdUDfD48keFEpk73Icd5PkjyLOYDGW0oErWWtjUfOkUAHI9u-PyU1HhhM01Ptlpof4nWicVf89WiaI\" alt=\"\" width=\"700\" height=\"146\"\/><\/a><figcaption><strong>Figure 1:<\/strong> Overview of the SAM pipeline (source: <a href=\"https:\/\/arxiv.org\/pdf\/2304.02643.pdf\"  rel=\"noreferrer noopener\">Segment Anything | Meta AI Research<\/a>).<\/figcaption><\/figure>\n<\/div>\n<p>Next, the model takes input prompts such as points, bounding boxes, text, or masks and encodes them using the prompt encoder and convolutions (shown in purple).<\/p>\n<p>The mask decoder (shown in yellow) maps the input representations of the image and the prompts to an output mask which is then compared with the ground truth mask to compute the loss and backpropagate through the network.<\/p>\n<p>SAM uses focal loss and dice loss for training the model. The focal loss is simply a variation of the cross-entropy loss function, ensuring that the pixels are classified correctly in the predicted segmentation mask. On the other hand, the dice loss aims to increase the overlap (i.e., the intersection over the union area, to be more precise) between the predicted and ground truth mask.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h4Inference\"\/>\n<h4><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h4Inference\"><strong>Inference with SAM<\/strong><\/a><\/h4>\n<p>Once SAM is pre-trained with the promptable segmentation objective mentioned above, it can be used to segment objects or regions in images based on the input prompt provided by the user.<\/p>\n<p>For example, given an image of a kitchen with a potted plant on the slab, we can sample single or multiple points on the plant region and pass them as input to SAM to segment out the plant in the image. We can also provide a bounding box around the potted plant and ask SAM to segment the object inside the bounding box.<\/p>\n<p>Furthermore, we can also use prompting to specify further and control the region we want segmented. For instance, if we want to segment only the region with plant branches and leaves and exclude the pot which holds the plant, we can simply pass points on the plant region as positive points and the points on the potted region as negative points. This indicates to the model to predict a segmentation mask for the region, which includes the positive points and excludes the negative points.<\/p>\n<p>Additionally, we can use any combination of these prompts to specify the region where we want to segment the object. For instance, we can provide a combination of bounding box coordinates and negative points to specify a region inside the box but exclude the negative point.<\/p>\n<p>Note that currently, the released code for SAM does not directly support text-based prompts. However, in this tutorial, we will see how to integrate SAM with another off-the-shelf model (i.e., Grounding DINO) to use text prompts for segmenting objects.<\/p>\n<p>Let us now go ahead and implement the code to perform segmentation tasks as explained above and see our SAM make predictions in real-time.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Structure\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Structure\"><strong>Project Structure<\/strong><\/a><\/h3>\n<p>We first need to review our project directory structure.<\/p>\n<p>Start by accessing this tutorial\u2019s <strong><em>\u201cDownloads\u201d<\/em><\/strong> section to retrieve the source code and example images.<\/p>\n<p>From there, take a look at the directory structure:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"1\">\u251c\u2500\u2500 clip_integration.py\n\u251c\u2500\u2500 gdino_integration.py\n\u251c\u2500\u2500 get_objects.py\n\u251c\u2500\u2500 images\n\u2502   \u251c\u2500\u2500 kitchen.jpeg\n\u2502   \u2514\u2500\u2500 living_room.jpg\n\u251c\u2500\u2500 pyimagesearch\n\u2502   \u251c\u2500\u2500 config.py\n\u2502   \u2514\u2500\u2500 utils.py\n\u251c\u2500\u2500 requirements.txt\n\u251c\u2500\u2500 sam.py\n\u2514\u2500\u2500 setup.sh<\/pre>\n<p>We first have the checkpoints folder, which contains the pre-trained checkpoints for SAM and the Grounding DINO model, as we will see later in this tutorial.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">clip_integration.py<\/code> file implements the code to integrate SAM with CLIP, which we will discuss in depth in the next tutorial of this series. <\/p>\n<p>Furthermore, the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">gdino_integration.py<\/code> file implements the code, allowing us to prompt SAM with text prompts with the help of Grounding DINO and predict segmentation masks in real-time.<\/p>\n<p>Next, in the directory structure, we have the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">get_objects.py<\/code> file, which we will discuss in detail in the next tutorial in this series, and the images folder, which consists of the two images we will use for this tutorial series.<\/p>\n<p>In the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">pyimagesearch<\/code> folder, we have the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">config.py<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">utils.py<\/code> files, which define the parameters, initial configurations, and helper functions, allowing us to visualize our predictions, respectively.<\/p>\n<p>Furthermore, the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sam.py<\/code> file implements the code to use different prompts like points and bounding boxes to predict segmentation masks with SAM. <\/p>\n<p>Finally, the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">requirements.txt<\/code> file contains the required packages and modules to set up our environment and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">setup.sh<\/code> file contains code to download other dependencies and pre-trained checkpoints<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Development\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Development\"><strong>Configuring Your Development Environment<\/strong><\/a><\/h3>\n<p>To follow this guide, you need to have the SAM and off-the-shelf Grounding DINO packages installed. Furthermore, you will need to download the pre-trained checkpoints for these models.<\/p>\n<p>Luckily, this can be done easily by following the commands below:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"2\">$ sh setup.sh\n$ pip install -r requirements.txt<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Help\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Help\"><strong>Need Help Configuring Your Development Environment?<\/strong><\/a><\/h3>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"334\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"Need help configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University \u2014 you\u2019ll be up and running with this tutorial in minutes.\" class=\"wp-image-19836\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?size=126x84&amp;lossy=2&amp;strip=1&amp;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter-300x200.png?lossy=2&amp;strip=1&amp;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?size=378x253&amp;lossy=2&amp;strip=1&amp;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2021\/05\/pyimagesearch_plus_jupyter.png?lossy=2&amp;strip=1&amp;webp=1 500w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><\/a><figcaption>Need help configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join <a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\">PyImageSearch University<\/a> \u2014 you\u2019ll be up and running with this tutorial in minutes.<\/figcaption><\/figure>\n<\/div>\n<p>All that said, are you:<\/p>\n<ul>\n<li>Short on time?<\/li>\n<li>Learning on your employer\u2019s administratively locked system?<\/li>\n<li>Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?<\/li>\n<li><strong>Ready to run the code <\/strong><strong><em>immediately<\/em><\/strong><strong> on your Windows, macOS, or Linux system?<\/strong><\/li>\n<\/ul>\n<p>Then join <a href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/\"  rel=\"noreferrer noopener\">PyImageSearch University<\/a> today!<\/p>\n<p><strong>Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides <\/strong><strong><em>pre-configured<\/em><\/strong><strong> to run on Google Colab\u2019s ecosystem right in your web browser!<\/strong> No installation required.<\/p>\n<p>And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Creating\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Creating\"><strong>Creating Our Configuration File<\/strong><\/a><\/h3>\n<p>It is now time to start our implementation and see some code in action which allows us to segment objects with prompts in real-time.<\/p>\n<p>We start by opening our <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.py<\/code> file, which contains the initial parameters and configurations we will use in this tutorial.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"3\">SAM_CHECKPOINT_PATH = \"checkpoints\/sam_vit_h_4b8939.pth\"\nMODEL_TYPE = \"vit_h\"\n\nGDINO_CONFIG = \"GroundingDINO\/groundingdino\/config\/GroundingDINO_SwinT_OGC.py\"\nGDINO_CHECKPOINT_PATH = \"checkpoints\/groundingdino_swint_ogc.pth\"\n\nIMG_PATH = [\"images\/kitchen.jpeg\", \"images\/living_room.jpg\"]\nTEXT_PROMPT = [\"plant\", \"apples\", \"vase\"]\nIMG_SIZE = (256, 256)\n\nBOX_TRESHOLD = 0.35\nTEXT_TRESHOLD = 0.25\n\nOUT_PATH = \"predictions\"\nOUT_PROMPT_PATH = \"prompt_image.jpg\"\nOUT_PRED_PATH = \"predicted_image.jpg\"<\/pre>\n<p>We start defining the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">SAM_CHECKPOINT_PATH<\/code>, which points to the location where pre-trained weights of SAM will store (<strong>Line 1<\/strong>), and also define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">MODEL_TYPE<\/code>, which indicates the type of vision transformer architecture to use for the image encoder in the SAM architecture (<strong>Line 2<\/strong>).<\/p>\n<p>Next, we define the path to the configuration file (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">GDINO_CONFIG<\/code>) for the Grounding DINO model on <strong>Line 4<\/strong>, which will help us get segmentations for text prompts from SAM, as we will discuss later in the tutorial. Furthermore, we also define the path to the pre-trained Grounding DINO checkpoint on <strong>Line 5<\/strong> (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">GDINO_CHECKPOINT_PATH<\/code>).<\/p>\n<p>Now that we have defined the model-related parameters, let us go ahead and define the image and prompt-related parameters next. <\/p>\n<p>On <strong>Line 7<\/strong>, we define the paths to the two images we will use for the purpose of this tutorial as a list (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">IMG_PATH<\/code>). Next, on <strong>Line 8<\/strong>, we define the list <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">TEXT_PROMPT<\/code>, which contains the various text-based prompts we will use to query our model for segmentation prediction. On <strong>Line 9<\/strong>, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">IMG_SIZE<\/code>, which indicates the dimension of the images.<\/p>\n<p>Apart from this, we also define the thresholds for bounding box and text (i.e., <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">BOX_TRESHOLD<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">TEXT_TRESHOLD<\/code>) on <strong>Lines 11 and 12<\/strong>, which the Grounding DINO model will use to make predictions as well and which we will discuss in detail later in the tutorial.<\/p>\n<p>Finally, we define the parameters for the output folder where our predictions will be stored.<\/p>\n<p>On <strong>Line 14<\/strong>, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">OUT_PATH<\/code>, which points to the folder\u2019s location where our predictions will be stored. Furthermore, on <strong>Lines 15 and 16<\/strong>, we define the filenames that will be used to save our image with the prompt visualization (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">OUT_PROMPT_PATH<\/code>) and the final image with the predicted segmentation (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">OUT_PRED_PATH<\/code>).<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Visualization\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Visualization\"><strong>Implementing Visualization Functions<\/strong><\/a><\/h3>\n<p>Now that we have discussed the parameter configurations, let us implement and discuss the helper functions, allowing us to visualize our prompt and final segmentation predictions from SAM.<\/p>\n<p>We open the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">utils.py<\/code> file and get started.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"4\">import matplotlib.pyplot as plt\nimport numpy as np\n\n\ndef show_points(coords, labels, ax, marker_size=375):\n    pos_points = coords[labels == 1]\n    neg_points = coords[labels == 0]\n    ax.scatter(\n        pos_points[:, 0],\n        pos_points[:, 1],\n        color=\"green\",\n        marker=\"*\",\n        s=marker_size,\n        edgecolor=\"white\",\n        linewidth=1.25,\n    )\n    ax.scatter(\n        neg_points[:, 0],\n        neg_points[:, 1],\n        color=\"red\",\n        marker=\"*\",\n        s=marker_size,\n        edgecolor=\"white\",\n        linewidth=1.25,\n    )\n\n\ndef show_box(box, ax, processed_dim=False):\n    x0, y0 = box[0], box[1]\n    if processed_dim:\n        w, h = box[2], box[3]\n    else:\n        w, h = box[2] - box[0], box[3] - box[1]\n    ax.add_patch(\n        plt.Rectangle((x0, y0), w, h, edgecolor=\"green\", facecolor=(0, 0, 0, 0), lw=2)\n    )\n\n\ndef show_mask(mask, ax):\n    color = np.array([0, 0, 1, 0.6])\n    h, w = mask.shape[-2:]\n    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)\n    ax.imshow(mask_image)\n\n\ndef show_all_masks(masks):\n    sorted_masks = sorted(masks, key=(lambda x: x[\"area\"]), reverse=True)\n    ax = plt.gca()\n    ax.set_autoscale_on(False)\n\n    img = np.ones(\n        (\n            sorted_masks[0][\"segmentation\"].shape[0],\n            sorted_masks[0][\"segmentation\"].shape[1],\n            4,\n        )\n    )\n    img[:, :, 3] = 0\n    for mask in sorted_masks:\n        m = mask[\"segmentation\"]\n        color_mask = np.concatenate([np.random.random(3), [0.35]])\n        img[m] = color_mask\n    ax.imshow(img)<\/pre>\n<p>We start by importing the necessary packages, such as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">matplotlib<\/code> (<strong>Line 1<\/strong>) and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">numpy<\/code> (<strong>Line 2<\/strong>).<\/p>\n<p>Next, we define our helper functions which will allow us to plot the different prompts and visualize the segmentation masks.<\/p>\n<p>We start by implementing the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">show_points<\/code> function (<strong>Lines 5-25<\/strong>), which allows us to visualize the points in the prompt by plotting them on the figure input to the function.<\/p>\n<p>The function takes as input the point coordinates, binary labels corresponding to the points (indicating whether we want the region with or without the point in the predicted segmentation mask), the matplotlib figure parameter (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">ax<\/code>), and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">marker_size<\/code> for the plotted point (<strong>Line 5<\/strong>).<\/p>\n<p>Next, we get the points with labels equal to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code>, store them as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pos_points<\/code>, and also get the points with labels equal to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code> and store them as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">neg_points<\/code> (<strong>Lines 6 and 7<\/strong>).<\/p>\n<p>Then we use the matplotlib <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">scatter()<\/code> function to plot the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">pos_points<\/code> with green color and the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">neg_points<\/code> with red color, as shown on <strong>Lines 8-25<\/strong>. Note that this function takes as input the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">y<\/code> coordinates of the points, the marker type, the color for the marker to plot, the size of the marker, the color of the edge, and the width of line as shown.<\/p>\n<p>Next, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">show_box<\/code> function, which allows us to visualize the bounding box in the prompt by plotting it on the figure input to the function.<\/p>\n<p>The function takes as input the box (i.e., a list with coordinates of the box), the matplotlib figure parameter (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">ax<\/code>) on which we want to plot the box and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">processed_dim<\/code> argument.<\/p>\n<p>Note that the input box provided as argument (which is a set of 4 values) can be in either of 2 formats, that is, <\/p>\n<ul>\n<li><strong>Case 1:<\/strong> <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x, y<\/code> coordinates of bottom left corner and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x, y<\/code> coordinates of the top right corner <\/li>\n<li><strong>Case 2:<\/strong> <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x, y<\/code> coordinates of bottom left corner and the width and height of the box.<\/li>\n<\/ul>\n<p>The <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">processed_dim<\/code> argument states which format the bounding box is provided in. It is <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">False<\/code> in <strong>Case 1<\/strong> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">True<\/code> in <strong>Case 2<\/strong>.<\/p>\n<p>On <strong>Line 29<\/strong>, we get the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">x0<\/code>, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">y0<\/code> coordinates as shown. In case the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">processed_dim<\/code> flag is <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">True<\/code>, we get the width and height of the bounding box (i.e., <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">box[2]<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">box[3]<\/code>), as shown on <strong>Line<\/strong> <strong>31<\/strong>. If not, we compute the width and height of the bounding box as follows, <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">w=box[2]-box[0]<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">h=box[3]-box[1]<\/code>), as shown on line <strong>Line 33<\/strong>. <\/p>\n<p>Finally, we use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">add_patch<\/code> function from matplotlib and plot a green-colored rectangle using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plt.Rectangle<\/code> function, as shown on <strong>Lines 34-36<\/strong>.<\/p>\n<p>Now that we have implemented functions to visualize the prompts, we define the functions that will allow us to visualize the segmentation masks.<\/p>\n<p>We start with the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">show_mask<\/code> function, which takes as input a single predicted mask, the matplotlib figure parameter (i.e., <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">ax<\/code>) on which we want to plot and visualize the mask.<\/p>\n<p>On <strong>Line 40<\/strong>, we define the color, which is an array with the R, G, B value and the value of transparency with which the mask is plotted on the image (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">[0, 0, 1, 0.6]<\/code>). Next, we get the height and width of the input mask (<strong>Line 41<\/strong>).<\/p>\n<p>Next, we reshape the mask to have the shape in the format <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">(h,w,1)<\/code>, reshape the color to have shape <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">(1,1,4)<\/code>, and multiply both, as shown on <strong>Line 42<\/strong>. This broadcasts the mask to the shape <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">(h,w,4)<\/code> and color to the shape <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">(h,w,4)<\/code>, which gets multiplied elementwise.<\/p>\n<p>We finally use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">imshow()<\/code> function to visualize the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">mask_image<\/code>, as shown on <strong>Line 43<\/strong>.<\/p>\n<p>We now define a function to visualize multiple masks together, which we will use in the next tutorial of this series.<\/p>\n<p>The <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">show_all_masks<\/code> function (<strong>Lines 46-63<\/strong>) takes as input the masks, as shown on <strong>Line 46<\/strong>.<\/p>\n<p>On <strong>Line 47<\/strong>, we sort the input masks in decreasing order of their area and store them in <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sorted_masks<\/code>. Next, we initialize the matplotlib figure on <strong>Lines 48 and 49.<\/strong><\/p>\n<p>On <strong>Line 51<\/strong>, we initialize an array of ones of the shape with <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">height=sorted_masks[0]['segmentation'].shape[0]<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">width=sorted_masks[0]['segmentation'].shape[1]<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">channels=4<\/code>. Furthermore, we set the third dimension of this array to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code>, as shown on <strong>Line 58<\/strong>.<\/p>\n<p>Next, for each mask in the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sorted_masks<\/code> list (<strong>Line 59<\/strong>), we get the corresponding segmentation mask (<strong>Line 60<\/strong>) and create a <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">color_mask<\/code> with random R, G, and B values with <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0.35<\/code> as the transparency value (<strong>Line 61<\/strong>). We assign the color mask shown on <strong>Line 62<\/strong> to plot the predicted segmentation mask with the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">color_mask<\/code>.<\/p>\n<p>Finally, on <strong>Line 63<\/strong>, we show the final image using matplotlib.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3SAM\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3SAM\"><strong>Segmentation with SAM<\/strong><\/a><\/h3>\n<p>Now that the implementation of configurations and helper function is complete, we are ready to dive deeper into the code, which allows us to use SAM and make segmentation predictions.<\/p>\n<p>Specifically, we will use prompts such as point coordinates or bounding box coordinates to segment objects of interest in real-time, as discussed above.<\/p>\n<p>Let us open the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sam.py<\/code> file and get started. <\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"5\"># USAGE\n# python sam.py\n\n# import the necessary packages\nfrom pyimagesearch import config, utils\nfrom segment_anything import SamPredictor, sam_model_registry\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cv2\nimport os\n\n# function to visualize and save segmentation\ndef visualize_and_save_segmentation(\n    prompt, model, query_index, save_multiple_masks=False\n):\n    input_points, input_labels, input_box = prompt\n\n    # Create output directory if it doesn't exist\n    if not os.path.exists(config.OUT_PATH):\n        os.makedirs(config.OUT_PATH)\n\n    # Define paths to save prompt and prediction images\n    prompt_path = os.path.join(\n        config.OUT_PATH, f\"{query_index}-{config.OUT_PROMPT_PATH}\"\n    )\n    prediction_path = os.path.join(\n        config.OUT_PATH, f\"{query_index}-{config.OUT_PRED_PATH}\"\n    )\n\n    # Plot the input image with prompts\n    plt.figure(figsize=(6, 6))\n    plt.imshow(image)\n    if input_points is not None:\n        utils.show_points(input_points, input_labels, plt.gca())\n    if input_box is not None:\n        utils.show_box(input_box, plt.gca())\n        input_box = input_box[None, :]\n\n    # Save prompt image\n    print(f\"[INFO] saving the prompt image to {prompt_path}...\")\n    plt.savefig(prompt_path)\n\n    # Make predictions using SAM and visualize them\n    masks, scores, _ = model.predict(\n        point_coords=input_points,\n        point_labels=input_labels,\n        box=input_box,\n        multimask_output=save_multiple_masks,\n    )\n\n    # Save the predicted image\n    print(f\"[INFO] saving the predicted image to {prediction_path}...\")\n    plt.figure(figsize=(6, 6))\n    for i, (mask, score) in enumerate(zip(masks, scores)):\n        plt.imshow(image)\n        utils.show_mask(mask, plt.gca())\n        plt.title(f\"Mask {i+1}, Score: {score:.3f}\", fontsize=18)\n        plt.axis(\"on\")\n    plt.savefig(prediction_path)\n    plt.close()<\/pre>\n<p>We start by importing the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config<\/code> file and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">utils<\/code> function, which contains the configuration of parameters and helper functions we will use for this tutorial (<strong>Line 5<\/strong>). <\/p>\n<p>Next, we import the necessary modules, allowing us to use SAM for making predictions in our tutorial. Specifically, we get the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">SamPredictor<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sam_model_registry<\/code> modules from <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">segment_anything<\/code>, as shown on <strong>Line 6<\/strong>.<\/p>\n<p>We also import other necessary packages such as <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">matplotlib<\/code> (<strong>Line 7<\/strong>), NumPy (<strong>Line 8<\/strong>), OpenCV (<strong>Line 9<\/strong>), and the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">os<\/code> module (<strong>Line 10<\/strong>) as shown. <\/p>\n<p>Now that we have imported the necessary packages, let us implement the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">visualize_and_save_segmentation<\/code> function (<strong>Lines 13-60<\/strong>), which will take the prompts, the SAM predictor model, and the index of the query as arguments and visualize the output segmentation masks. <\/p>\n<p>Note that this function expects the prompt to be a list of points, corresponding labels, and bounding box coordinates on the image specifying the location of the object which we want to segment (<strong>Line 13<\/strong>). Also, as discussed above, SAM accepts any combination of these parameters, and some can even be <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">None<\/code>, as we will see later.<\/p>\n<p>Furthermore, the segment function also takes an additional argument <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">save_multiple_masks<\/code>, indicating whether we want SAM to output a single prediction or multiple predictions. For now, we keep the multimask argument as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">False<\/code> by default, as shown on <strong>Line 13<\/strong>. <\/p>\n<p>On <strong>Line 16<\/strong>, we get <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_points<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_labels<\/code>, and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_box<\/code> from the prompts list. <\/p>\n<p>We check if the folder where the output predictions will be stored (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.OUT_PATH<\/code>) already exists; if not, we create it (<strong>Lines 19 and 20<\/strong>). Furthermore, we define the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">prompt_path<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">prediction_path<\/code> on <strong>Lines 23 and 26<\/strong>, indicating where our output prompt visualization and segmentation visualization will be stored.<\/p>\n<p>On <strong>Lines 31 and 32<\/strong>, we visualize the input image with the help of matplotlib, as shown.<\/p>\n<p>Next, we visualize the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">input_points<\/code> or bounding box prompt on <strong>Lines 33-37<\/strong>. Specifically, we first check if <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">input_points<\/code> provided is <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">None<\/code> (<strong>Line 33<\/strong>), and if not, we use the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">show_points<\/code> function to visualize the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">input_points<\/code> (<strong>Line 34<\/strong>). <\/p>\n<p>Similarly, we visualize the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_box<\/code> by first checking if the provided entry is <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">None<\/code>, and if not, we use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">show_box<\/code> function to visualize the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_box<\/code> (<strong>Lines 35 and 36<\/strong>). On <strong>Line 37<\/strong>, we get the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_box<\/code> in the format that the SAM expects.<\/p>\n<p>Once we have the input prompt plotted, we save the visualization using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plt.savefig<\/code> at the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">prompt_path<\/code>, as shown on <strong>Line 41<\/strong>.<\/p>\n<p>Now that everything is set up, we can use our pre-trained SAM to predict the segmentation masks given the input prompts. <\/p>\n<p>On <strong>Lines 44-49<\/strong>, we call the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">model.predict<\/code> function, which takes as input the prompt coordinates (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">point_coords<\/code>), the corresponding labels (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">point_labels<\/code>), the bounding box (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">box<\/code>), and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">multimask_output<\/code> parameter as shown.<\/p>\n<p>This function outputs the masks, the corresponding scores for each mask, and prediction logits, as shown on <strong>Line 44<\/strong>.<\/p>\n<p>Finally, we are ready to visualize our SAM predictions. We start by iterating over the masks and the corresponding scores, as shown on <strong>Line 54<\/strong>.<\/p>\n<p>We first plot the input image using matplotlib, as shown on <strong>Lines 55<\/strong>, and use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">show_mask()<\/code> function to visualize the mask predicted by the model (<strong>Line 56<\/strong>). Finally, we assign a title to the plot with the corresponding score (<strong>Line 57<\/strong>), use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plt.axis('on')<\/code> functionality (<strong>Line 58<\/strong>) to show the coordinates on the frame.<\/p>\n<p>Next, we use <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">plt.savefig<\/code>, and save the plot on <strong>Lines 59 and 60<\/strong>.<\/p>\n<p>This completes the definition of our segment function. Let us now initialize our SAM and use our pipeline to make predictions in real-time.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"63\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"55\">if __name__ == \"__main__\":\n    # Load input image and convert to RGB\n    print(\"[INFO] loading the image...\")\n    image = cv2.imread(config.IMG_PATH[0])\n    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n\n    # Initialize and load SAM\n    print(\"[INFO] loading SAM...\")\n    sam = sam_model_registry[config.MODEL_TYPE](checkpoint=config.SAM_CHECKPOINT_PATH)\n    predictor = SamPredictor(sam)\n    predictor.set_image(image)\n\n    # Define prompts and visualize and save segmentation\n    prompts = [\n        (np.array([[410, 475]]), np.array([1]), None),\n        (np.array([[420, 495], [400, 400], [430, 330]]), np.array([1, 1, 1]), None),\n        (np.array([[420, 495], [400, 400], [430, 330]]), np.array([1, 1, 0]), None),\n        (None, None, np.array([365, 243, 480, 515])),\n        (np.array([[420, 495]]), np.array([0]), np.array([365, 243, 480, 515])),\n    ]\n\n    for i, prompt in enumerate(prompts):\n        print(f\"[INFO] creating prompt for query {i}...\")\n        visualize_and_save_segmentation(prompt, predictor, i)<\/pre>\n<p>We start by loading our image using OpenCV from the path defined by <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.IMG_PATH[0]<\/code> and converting it from BGR to RGB color space using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">cv2.cvtColor<\/code> function, as shown on <strong>Lines 66 and 67<\/strong>.<\/p>\n<p>Next, we initialize our SAM using <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sam_model_registry<\/code> with the type of model (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.MODEL_TYPE<\/code>), which we take as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">vit_h<\/code> for this tutorial. Furthermore, we also provide the path where we stored the SAM checkpoint (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.SAM_CHECKPOINT_PATH<\/code>) (<strong>Line 71<\/strong>). <\/p>\n<p>On <strong>Lines 72 and 73<\/strong>, we get the predictor using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">SamPredictor()<\/code> module and use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">predictor.set_image<\/code> functionality to set our image as the input to the SAM.<\/p>\n<p>We are now ready to take the prompts as input and make predictions using our pretrained SAM in real-time.<\/p>\n<p>On <strong>Lines 76-82, <\/strong>we define a list of prompts which we will use as input to SAM to understand different segmentation capabilities.<\/p>\n<p>We first take a single point as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_point<\/code> (<strong>Line 77<\/strong>) and take the corresponding label to be <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code> (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_label<\/code>) since we want to predict a mask that includes this point.<\/p>\n<p>Similarly, let us try to use multiple points to further define and specify which regions we want to include or exclude in the segmentation mask.<\/p>\n<p>On <strong>Line 78<\/strong>, we take three points as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_point<\/code>, which we want to include in our segmentation mask. We also take the corresponding label to be <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code> (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_label<\/code>) since we want to predict a mask that includes all points (<strong>Line 78<\/strong>).<\/p>\n<p>Next, we take the same three points as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_point<\/code> (<strong>Line 79<\/strong>), but we intend to predict a segmentation mask of the region that includes the first two points and exclude the third point in the predicted mask. Thus, we take the corresponding label to be <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">1<\/code> for the first two points and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">0<\/code> for the third point (<strong>Line 79<\/strong>).<\/p>\n<p>Then, we consider the case where we only provide a bounding box for the plant on the slab in the image. We set the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">input_box<\/code> parameter, as shown on <strong>Line 80<\/strong>. Note that these are the coordinates of the box bounding for the plant on the shelf, and we will see later how we can predict these.<\/p>\n<p>On <strong>Line 80<\/strong>, we set <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_point<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_label<\/code> to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">None<\/code> since we predict based on a bounding box prompt.<\/p>\n<p>Finally, it is time to understand how combining the bounding box and point coordinate prompts can further specify the region we want to segment. <\/p>\n<p>Let us segment a region within the bounding box, which excludes the first point.<\/p>\n<p>On <strong>Line 81<\/strong>, we take the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">input_box<\/code> defined above and take a point on the image. Then, we take the corresponding label to be <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">0<\/code> for the point since we want to exclude that region from the predicted segmentation mask (<strong>Line 81<\/strong>).<\/p>\n<p>Now that we have defined all our prompts, let us go ahead and use SAM for segmentation and visualize our predictions in real-time.<\/p>\n<p>We iterate over the prompts and use the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">visualize_and_save_segmentation<\/code>, which takes as input our prompt list, the SAM predictor and the index of the specific prompt from the prompt list that we want to use for segmentation.<\/p>\n<p><strong>Figure 2<\/strong> shows the output predictions from our SAM pipeline.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh4.googleusercontent.com\/jE28Eq1CbDd98nvicQhRITz2d0rwsKjR8twN06yTIfPRBL_ZO-NXjFs9qP2krUKj6VaIxE0jg9O_iVHGbpZry9J1sWto3ZzlPR4KuW53S9Er40dGOygiSIAvOdiay3MD9JdzJfQHniqbaT-tvq76o_8\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/jE28Eq1CbDd98nvicQhRITz2d0rwsKjR8twN06yTIfPRBL_ZO-NXjFs9qP2krUKj6VaIxE0jg9O_iVHGbpZry9J1sWto3ZzlPR4KuW53S9Er40dGOygiSIAvOdiay3MD9JdzJfQHniqbaT-tvq76o_8\" alt=\"\" width=\"279\" height=\"500\"\/><\/a><figcaption><strong>Figure 2:<\/strong> Prompts (<em>left<\/em>) and Segmentation predictions (<em>right<\/em>) from SAM (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>Let us discuss in detail the results shown in <strong>Figure 2<\/strong>.<\/p>\n<p>In <strong>row 1<\/strong>, we notice that we have a single point as prompt (<em>left<\/em>), and we notice that SAM outputs a segmentation mask (<em>right<\/em>, in blue) which segments out the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pot<\/code> region on which the point lies.<\/p>\n<p>In <strong>row 2<\/strong>, we notice that we have three points that span the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plant with pot<\/code> on the slab, and SAM correctly outputs a segmentation mask (<em>right<\/em>, in blue) which segments out the entire plant region on which the points lie.<\/p>\n<p>In <strong>row 3<\/strong>, we notice that we have three points, out of which we want to segment a region such that the 2 points are included, and the third red point on the brown branch is excluded. Notice that SAM shows amazing results for this prompt as it correctly segments the whole plant region except the brown branch part where the red point lies.<\/p>\n<p>In <strong>row 4<\/strong>, we simply provide a bounding box as a prompt and notice that SAM easily segments out the plant inside the box.<\/p>\n<p>Finally, in <strong>row 5<\/strong>, we use a combination of bounding box and point coordinates where we want to segment the object within the bounding box but exclude the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pot<\/code> region where the red point lies. Note that the SAM prediction captures these intricate details of the prompts and segments out the plant except the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pot<\/code> region, as shown on the right.<\/p>\n<p>Engineering different prompts allows us to control the SAM predictions to better segment the regions of interest.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3TextPrompts\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3TextPrompts\"><strong>Segmenting with SAM and Text Prompts<\/strong><\/a><\/h3>\n<p>In the previous section, we discussed using point coordinates and bounding boxes to make predictions using the promptable SAM.<\/p>\n<p>The original SAM paper (<a href=\"https:\/\/arxiv.org\/abs\/2304.02643\"  rel=\"noreferrer noopener\">link<\/a>) also mentioned that SAM can use text prompts to segment desired objects. However, the current release of the official code and models do not directly support segmentation with text prompts.<\/p>\n<p>In this section, we will tackle this by using an off-the-shelf model and integrating it with our SAM pipeline. <\/p>\n<p>The Grounding DINO model (<a href=\"https:\/\/arxiv.org\/pdf\/2303.05499.pdf\"  rel=\"noreferrer noopener\">link<\/a>)  takes in a text prompt and an image and outputs bounding box coordinates in the image corresponding to the object described by the text. For example, given that we provide a text prompt <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">dog<\/code>, the Grounding DINO model will simply output boxes bounding the regions where dogs occur in that image.<\/p>\n<p>Note that this allows us to build a simple interface with SAM where we can provide a text prompt to the off-the-shelf Grounding DINO model, and it will output bounding boxes corresponding to that text which can then be directly used to prompt SAM, as we discussed in the previous section.<\/p>\n<p>Let us go ahead and open the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">gdino_integration.py<\/code> file to see this in action.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"6\"># USAGE: python gdino_integration.py\n\nimport cv2\nimport numpy as np\nimport torch\nfrom groundingdino.util.inference import load_image, load_model, predict\nfrom segment_anything import SamPredictor, sam_model_registry\n\nfrom pyimagesearch import config\nfrom sam import visualize_and_save_segmentation\n\n\ndef get_bounding_boxes(img_path, text_prompt, box_threshold, text_threshold):\n    # Load GDINO model and input image\n    model = load_model(config.GDINO_CONFIG, config.GDINO_CHECKPOINT_PATH)\n    _, image = load_image(img_path)\n\n    boxes, _, _ = predict(\n        model=model,\n        image=image,\n        caption=text_prompt,\n        box_threshold=box_threshold,\n        text_threshold=text_threshold,\n    )\n\n    return boxes<\/pre>\n<p>We start by importing the OpenCV library (<strong>Line 3<\/strong>), the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">numpy<\/code> library (<strong>Line 4<\/strong>), and the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pytorch<\/code> library (<strong>Line 5<\/strong>).<\/p>\n<p>Next, we import SAM and the corresponding modules (as discussed above) and the Grounding DINO modules on <strong>Lines 6 and 7<\/strong>. Furthermore, we import the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">config<\/code> file and the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">visualize_and_save_segmentation<\/code> function defined above in the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">sam.py<\/code> file on <strong>Lines 9 and 10<\/strong>.<\/p>\n<p>Now that we have imported the necessary packages, we implement the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">get_bounding_boxes()<\/code> function (<strong>Lines 13-26<\/strong>), using the Grounding DINO model to predict a bounding box based on the text prompt we provide.<\/p>\n<p>The function takes as input the image path, the text prompt, and the threshold for the bounding box and text that the Grounding DINO model expects, as shown on <strong>Line 13<\/strong>.<\/p>\n<p>We use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">load_model<\/code> function to initialize and load the Grounding DINO model. This function takes as input the path to the Grounding DINO config file and the path to the pre-trained checkpoint (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.GDINO_CONFIG<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.GDINO_CHECKPOINT_PATH<\/code>), as shown on <strong>Line 15<\/strong>.<\/p>\n<p>Next, use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">load_image<\/code> function, which takes as input the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">img_path<\/code> and outputs the image in the format expected by the Grounding DINO model (<strong>Line 16<\/strong>).<\/p>\n<p>Then, we call the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">predict<\/code> function from Grounding DINO to get the bounding box corresponding to the object in the text prompt. The <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">predict<\/code> function takes as input the pre-trained <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">model<\/code>, the input <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">image<\/code>, the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">text_prompt<\/code> corresponding to the object to detect, and the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">box_threshold<\/code> and <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">text_threshold<\/code> scores to be used by the grounded DINO model for the predictions (<strong>Lines 18-24<\/strong>).<\/p>\n<p>Finally, we return the set of bounding boxes (i.e., <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">boxes<\/code>) predicted by the grounded DINO model corresponding to the text prompt we provided (<strong>Line 26<\/strong>).<\/p>\n<p>This completes the implementation of our <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">get_bounding_box<\/code> function, and we are now ready to make predictions with our models.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"29\" data-enlighter-title=\"SAM from Meta AI (Part 1): Segmentation with Prompts\" data-enlighter-group=\"7\">if __name__ == \"__main__\":\n    # Load input image and convert to RGB\n    print(\"[INFO] Loading image...\")\n    image = cv2.imread(config.IMG_PATH[0])\n    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n    W, H = image.shape[1], image.shape[0]\n\n    # Initialize and load SAM\n    print(\"[INFO] Loading SAM model...\")\n    sam = sam_model_registry[config.MODEL_TYPE](checkpoint=config.SAM_CHECKPOINT_PATH)\n    predictor = SamPredictor(sam)\n    predictor.set_image(image)\n\n    # Process text prompts\n    print(\"[INFO] Generating masks from SAM...\")\n    for prompt_text in config.TEXT_PROMPT:\n        # Get bounding boxes for the given prompt\n        boxes = get_bounding_boxes(\n            config.IMG_PATH[0], prompt_text, config.BOX_TRESHOLD, config.TEXT_TRESHOLD\n        )\n\n        for index, bbox in enumerate(boxes):\n            # Preprocess bounding box\n            box = torch.Tensor(bbox) * torch.Tensor([W, H, W, H])\n            box[:2] -= box[2:] \/ 2\n            box[2:] += box[:2]\n            x0, y0, x1, y1 = box.int().tolist()\n\n            # Prepare SAM prompt\n            input_box = np.array([x0, y0, x1, y1])\n            input_point = None\n            input_label = None\n            segment_prompt = [input_point, input_label, input_box]\n\n            # Segment using the prepared prompt\n            visualize_and_save_segmentation(segment_prompt, predictor, index)<\/pre>\n<p>We start by loading our image using OpenCV from the path defined by <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">config.IMG_PATH[0]<\/code> and converting it from the BGR to RGB color space using the <code data-enlighter-language=\"python\" class=\"EnlighterJSRAW\">cv2.cvtColor<\/code> function, as shown on <strong>Lines 32 and 33<\/strong>. We also obtain our input image\u2019s width and height dimensions, as shown on <strong>Lines 34<\/strong>.<\/p>\n<p>Next, we initialize our SAM using <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">sam_model_registry<\/code> with the type of model (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.MODEL_TYPE<\/code>) as discussed above. Furthermore, we also provide the path where we stored the SAM checkpoint (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.SAM_CHECKPOINT_PATH<\/code>), as shown on <strong>Line 38<\/strong>. <\/p>\n<p>On <strong>Lines 39 and 40<\/strong>, we get the predictor using the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">SamPredictor()<\/code> module and use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">predictor.set_image<\/code> functionality to set our image as the input to SAM.<\/p>\n<p>Now that we have the SAM and our input image set up, we can take the text prompts as input and make predictions using our integrated Grounding DINO and SAM pipeline in real-time.<\/p>\n<p>We will test our pipeline by making predictions for each of the three text entries in our <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.TEXT_PROMPT<\/code> list (i.e., <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plant<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">apples<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">vase<\/code>) one by one.<\/p>\n<p>We start by iterating over the entries in <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">config.TEXT_PROMPT<\/code> (<strong>Line 44<\/strong>), and for each entry, we get the corresponding set of bounding boxes by calling the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">get_bounding_boxes()<\/code> function, which we defined on <strong>Line 46<\/strong>. <\/p>\n<p>Note that there might be multiple bounding boxes corresponding to the given object in the text prompt, as that object might have multiple instances in the image.<\/p>\n<p>Then we iterate over each <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">bbox<\/code> in the boxes list (<strong>Line 47<\/strong>). For each <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">bbox<\/code>, we multiply the output coordinates with <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">torch.Tensor([W, H, W, H])<\/code> (<strong>Line 52<\/strong>). Next, we process the coordinates as shown on <strong>Lines 53 and 54<\/strong> to get the box coordinates.<\/p>\n<p>Then we get the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x<\/code>, <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">y<\/code> coordinates from the box and convert them to integer values, as shown on <strong>Line 55<\/strong>.<\/p>\n<p>Now that we have the bounding box coordinates for the object in the given text prompt, we create a numpy array and store them as <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_box<\/code>, as shown on <strong>Line 58<\/strong>. Furthermore, we set the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_point<\/code> and <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">input_label<\/code> to <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">None<\/code> (<strong>Lines 59 and 60<\/strong>) and create our prompt list, which the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">visualize_and_save_segmentation<\/code> function takes, as discussed in detail above.<\/p>\n<p>Finally, we use the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">visualize_and_save_segmentation()<\/code> function, which takes as input the SAM predictor and the prompt along with the index to output segmentation masks.<\/p>\n<p><strong>Figure 3<\/strong> shows the results of our text prompt segmentation pipeline.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter is-resized\"><a href=\"https:\/\/lh3.googleusercontent.com\/6lHrCROb33NqXjLPL-h85-b6kBrv5bOpLlsoIAygSncIll2Ga03hpTmqCt8ffk48dvvBx1YZNXYMD_5N82DMI8HSicDA7VnwhsLmj9j9CBtTLFvyeouHARbNkJX5KTSNaXB2WoefMUQGY6Brif-6IRg\"  rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/6lHrCROb33NqXjLPL-h85-b6kBrv5bOpLlsoIAygSncIll2Ga03hpTmqCt8ffk48dvvBx1YZNXYMD_5N82DMI8HSicDA7VnwhsLmj9j9CBtTLFvyeouHARbNkJX5KTSNaXB2WoefMUQGY6Brif-6IRg\" alt=\"\" width=\"461\" height=\"500\"\/><\/a><figcaption><strong>Figure 3:<\/strong> Predicted bounding box from Grounding DINO (<em>left<\/em>) and corresponding SAM segmentation (<em>right<\/em>) (source: image by the author).<\/figcaption><\/figure>\n<\/div>\n<p>In <strong>row 1<\/strong>, we notice that Grounding DINO predicted the bounding box for the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plant<\/code> text prompt, that is, the small plant on the top shelf, and SAM segmented the plant (<em>right<\/em>) perfectly.<\/p>\n<p>Similarly, in <strong>row 2<\/strong>, Grounding DINO predicted the bounding box for the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">apple<\/code> text prompt, that is, the small apple near the tap, and SAM segmented the apple (<em>right<\/em>, blue color) very well. Notice that this is a pretty amazing result since the apple is very small and not clearly visible in the image.<\/p>\n<p>Finally, in <strong>row 3<\/strong>, Grounding DINO predicted the bounding box for the <code class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">vase<\/code> text prompt, that is, the vase below the potted plant, and SAM segmented exactly the vase (<em>right<\/em>, blue color) part, excluding the plant. <\/p>\n<p>After visualizing these results, we can clearly see that SAM has amazing capabilities as a foundational segmentation model and can be used to segment various objects in any image in a zero-shot way without fine-tuning.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<div id=\"pitch\" style=\"padding: 40px; width: 100%; background-color: #F4F6FA;\">\n<h3>What&#8217;s next? I recommend <a  href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&#038;utm_medium=bottomBanner&#038;utm_campaign=What%27s%20next%3F%20I%20recommend\">PyImageSearch University<\/a>.<\/h3>\n<p>\t<script src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z.jsonp\" async><\/script><script src=\"https:\/\/fast.wistia.com\/assets\/external\/E-v1.js\" async><\/script><\/p>\n<div class=\"wistia_responsive_padding\" style=\"padding:56.25% 0 0 0;position:relative;\">\n<div class=\"wistia_responsive_wrapper\" style=\"height:100%;left:0;position:absolute;top:0;width:100%;\">\n<div class=\"wistia_embed wistia_async_kno0cmko2z videoFoam=true\" style=\"height:100%;position:relative;width:100%\">\n<div class=\"wistia_swatch\" style=\"height:100%;left:0;opacity:0;overflow:hidden;position:absolute;top:0;transition:opacity 200ms;width:100%;\"><img decoding=\"async\" src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z\/swatch\" style=\"filter:blur(5px);height:100%;object-fit:contain;width:100%;\" alt=\"\" aria-hidden=\"true\" onload=\"this.parentNode.style.opacity=1;\" \/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div style=\"margin-top: 32px; margin-bottom: 32px; \">\n\t\t<strong>Course information:<\/strong><br \/>\n\t\t80 total classes \u2022 105+ hours of on-demand code walkthrough videos \u2022 Last updated: September 2023<br \/>\n\t\t<span style=\"color: #169FE6;\">\u2605\u2605\u2605\u2605\u2605<\/span> 4.84 (128 Ratings) \u2022 16,000+ Students Enrolled\n\t<\/div>\n<p><strong>I strongly believe that if you had the right teacher you could <em>master<\/em> computer vision and deep learning.<\/strong><\/p>\n<p>Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?<\/p>\n<p>That\u2019s <em>not<\/em> the case.<\/p>\n<p>All you need to master computer vision and deep learning is for someone to explain things to you in <em>simple, intuitive<\/em> terms. <em>And that\u2019s exactly what I do<\/em>. My mission is to change education and how complex Artificial Intelligence topics are taught.<\/p>\n<p>If you&#8217;re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you\u2019ll learn how to <em>successfully<\/em> and <em>confidently<\/em> apply computer vision to your work, research, and projects. Join me in computer vision mastery.<\/p>\n<p><strong>Inside PyImageSearch University you&#8217;ll find:<\/strong><\/p>\n<ul style=\"margin-left: 0px;\">\n<li style=\"list-style: none;\">&check; <strong>80 courses<\/strong> on essential computer vision, deep learning, and OpenCV topics<\/li>\n<li style=\"list-style: none;\">&check; <strong>80 Certificates<\/strong> of Completion<\/li>\n<li style=\"list-style: none;\">&check; <strong>105+ hours<\/strong> of on-demand video<\/li>\n<li style=\"list-style: none;\">&check; <strong>Brand new courses released <em>regularly<\/em><\/strong>, ensuring you can keep up with state-of-the-art techniques<\/li>\n<li style=\"list-style: none;\">&check; <strong>Pre-configured Jupyter Notebooks in Google Colab<\/strong><\/li>\n<li style=\"list-style: none;\">&check; Run all code examples in your web browser \u2014 works on Windows, macOS, and Linux (no dev environment configuration required!)<\/li>\n<li style=\"list-style: none;\">&check; Access to <strong>centralized code repos for <em>all<\/em> 520+ tutorials<\/strong> on PyImageSearch<\/li>\n<li style=\"list-style: none;\">&check; <strong> Easy one-click downloads<\/strong> for code, datasets, pre-trained models, etc.<\/li>\n<li style=\"list-style: none;\">&check; <strong>Access<\/strong> on mobile, laptop, desktop, etc.<\/li>\n<\/ul>\n<p style=\"text-align: center;\">\n\t\t<a  class=\"button link\" href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&#038;utm_medium=bottomBanner&#038;utm_campaign=What%27s%20next%3F%20I%20recommend\" style=\"background-color: #6DC713; border-bottom: none;\">Click here to join PyImageSearch University<\/a>\n\t<\/p>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h2Summary\"\/>\n<h2><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h2Summary\"><strong>Summary<\/strong><\/a><\/h2>\n<p>In this tutorial, we tried to gain a holistic understanding of SAM, a foundational segmentation model. Specifically, we discussed SAM\u2019s development and pre-training process and implemented code to predict segmentation masks using different prompts.<\/p>\n<p>Furthermore, we discussed how different prompts can be combined to prompt SAM and get control over the desired region to segment in input images.<\/p>\n<p>Finally, we discussed how SAM can seamlessly be integrated with the off-the-shelf Grounding DINO model to segment objects with text-based prompts.<\/p>\n<p>In the next tutorial, we will further understand how SAM can be integrated with other systems and used to perform diverse downstream tasks.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Credits\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Credits\"><strong>Credits<\/strong><\/a><\/h3>\n<p>This blog post is inspired by the official SAM paper and GitHub code release for SAM (<a href=\"https:\/\/github.com\/facebookresearch\/segment-anything\"  rel=\"noreferrer noopener\">https:\/\/github.com\/facebookresearch\/segment-anything<\/a>) and Grounding DINO (<a href=\"https:\/\/github.com\/IDEA-Research\/GroundingDINO\"  rel=\"noreferrer noopener\">https:\/\/github.com\/IDEA-Research\/GroundingDINO<\/a>).<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" id=\"h3Citation\"\/>\n<h3><a href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/#TOC-h3Citation\"><strong>Citation Information<\/strong><\/a><\/h3>\n<p><strong>Chandhok, S.<\/strong> \u201cSAM from Meta AI (Part 1): Segmentation with Prompts,\u201d <em>PyImageSearch<\/em>, P. Chugh, A. R. Gosthipaty, S. Huot, K. Kidriavsteva, and R. Raha, eds., 2023, <a href=\"https:\/\/pyimg.co\/0ivy4\"  rel=\"noreferrer noopener\">https:\/\/pyimg.co\/0ivy4<\/a> <\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"classic\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"false\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">@incollection{Chandhok_2023_SAM-Part1,\n  author = {Shivam Chandhok},\n  title = {{SAM} from {Meta AI} (Part 1): Segmentation with Prompts},\n  booktitle = {PyImageSearch},\n  editor = {Puneet Chugh and Aritra Roy Gosthipaty and Susan Huot and Kseniia Kidriavsteva and Ritwik Raha},\n  year = {2023},\n  url = {https:\/\/pyimg.co\/0ivy4},\n}<\/pre>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<div style=\"padding: 40px; width: 100%; background-color: #F4F6FA;\">\n<img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"Featured Image\" style=\"width: 100%; height: auto; margin-bottom: 20px;\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?size=126x70&#038;lossy=2&#038;strip=1&#038;webp=1 126w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv-300x166.png?lossy=2&#038;strip=1&#038;webp=1 300w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?size=378x209&#038;lossy=2&#038;strip=1&#038;webp=1 378w, https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2023\/05\/maskcv.png?lossy=2&#038;strip=1&#038;webp=1 500w\" sizes=\"(max-width: 500px) 100vw, 500px\"><\/p>\n<h3>Unleash the potential of computer vision with Roboflow &#8211; Free!<\/h3>\n<ul style=\"margin-left: 0px;\">\n<li style=\"list-style: none;\">Step into the realm of the future by <a  href=\"https:\/\/roboflow.com\/?ref=pyimagesearch\">signing up or logging into your Roboflow account<\/a>. Unlock a wealth of innovative dataset libraries and revolutionize your computer vision operations.<\/li>\n<li style=\"list-style: none;\">Jumpstart your journey by choosing from our broad array of datasets, or benefit from PyimageSearch\u2019s comprehensive library, crafted to cater to a wide range of requirements.<\/li>\n<li style=\"list-style: none;\">Transfer your data to Roboflow in any of the 40+ compatible formats. Leverage cutting-edge model architectures for training, and deploy seamlessly across diverse platforms, including API, NVIDIA, browser, iOS, and beyond. Integrate our platform effortlessly with your applications or your favorite third-party tools.<\/li>\n<li style=\"list-style: none;\">Equip yourself with the ability to train a potent computer vision model in a mere afternoon. With a few images, you can import data from any source via API, annotate images using our superior cloud-hosted tool, kickstart model training with a single click, and deploy the model via a hosted API endpoint. Tailor your process by opting for a code-centric approach, leveraging our intuitive, cloud-based UI, or combining both to fit your unique needs.<\/li>\n<li style=\"list-style: none;\">Embark on your journey today with absolutely no credit card required. Step into the future with Roboflow.<\/li>\n<\/ul>\n<p style=\"text-align: center;\">\n        <a  class=\"button link\" href=\"https:\/\/roboflow.com\/?ref=pyimagesearch\" style=\"background-color: #6DC713; border-bottom: none;\">Join Roboflow Now<\/a>\n    <\/p>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<p><strong>To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), <em>simply enter your email address in the form below!<\/em><\/strong><\/p>\n<div id=\"download-the-code\" class=\"post-cta-wrap\">\n<div class=\"gpd-post-cta\">\n<div class=\"gpd-post-cta-content\">\n<div class=\"gpd-post-cta-top\">\n<div class=\"gpd-post-cta-top-image\"><img decoding=\"async\" src=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?lossy=2&#038;strip=1&#038;webp=1\" alt=\"\" srcset=\"https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?lossy=2&#038;strip=1&#038;webp=1 410w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?size=126x174&#038;lossy=2&#038;strip=1&#038;webp=1 126w,https:\/\/b2633864.smushcdn.com\/2633864\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png?size=252x348&#038;lossy=2&#038;strip=1&#038;webp=1 252w\" sizes=\"(max-width: 410px) 100vw, 410px\" \/><\/div>\n<div class=\"gpd-post-cta-top-title\">\n<h4>Download the Source Code and FREE 17-page Resource Guide<\/h4>\n<\/div>\n<div class=\"gpd-post-cta-top-desc\">\n<p>Enter your email address below to get a .zip of the code and a <strong>FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning.<\/strong> Inside you&#8217;ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!<\/p>\n<\/div><\/div>\n<div class=\"gpd-post-cta-bottom\">\n<form id=\"footer-cta-code\" class=\"footer-cta\" action=\"https:\/\/www.getdrip.com\/forms\/4130035\/submissions\" method=\"post\"  data-drip-embedded-form=\"4130035\">\n\t\t\t\t\t<input name=\"fields[email]\" type=\"email\" value=\"\" placeholder=\"Your email address\" class=\"form-control\" \/><\/p>\n<p>\t\t\t\t\t<button type=\"submit\">Download the code!<\/button><\/p>\n<div style=\"display: none;\" aria-hidden=\"true\"><label for=\"website\">Website<\/label><br \/><input type=\"text\" id=\"website\" name=\"website\" tabindex=\"-1\" autocomplete=\"false\" value=\"\" \/><\/div>\n<\/p><\/form>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<\/div>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/\">SAM from Meta AI (Part 1): Segmentation with Prompts<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/\">PyImageSearch<\/a>.<\/p>\n\n<p class=\"syndicated-attribution\"><figure class= \\\"wp-block-image alignnone \\\"><img src= \\\"http:\/\/itteacheritfreelance.hk\/test\/wordpress\/wp-content\/uploads\/2016\/05\/logo2-2.png\\\" alt=\\\"IT\u96fb\u8166\u88dc\u7fd2 java\u88dc\u7fd2 \u70ba\u5927\u5bb6\u914d\u5c0d\u96fb\u8166\u88dc\u7fd2,IT freelance, \u79c1\u4eba\u8001\u5e2b, PHP\u88dc\u7fd2,CSS\u88dc\u7fd2,XML,Java\u88dc\u7fd2,MySQL\u88dc\u7fd2,graphic design\u88dc\u7fd2,\u4e2d\u5c0f\u5b78ICT\u88dc\u7fd2,\u4e00\u5c0d\u4e00\u79c1\u4eba\u88dc\u7fd2\u548cFreelance\u81ea\u7531\u5de5\u4f5c\u914d\u5c0d\u3002\\\"\/><figcaption>\u7acb\u523b\u8a3b\u518a\u53ca\u5831\u540d\u96fb\u8166\u88dc\u7fd2\u8ab2\u7a0b\u5427!<\/figcaption><\/figure>\r\n<\/br>Find A Teacher Form:\r\n<\/br>https:\/\/docs.google.com\/forms\/d\/1vREBnX5n262umf4wU5U2pyTwvk9O-JrAgblA-wH9GFQ\/viewform?edit_requested=true#responses\r\n<\/br><\/br>Email:\r\n<\/br>public1989two@gmail.com<br><br><br><br><br><br><br>\r\n<a href=www.itsec.hk style=color:#FFFFFF;>www.itsec.hk<\/a><br>\r\n<a href=\\\"www.itsec.vip\\\" style=color:#FFFFFF;>www.itsec.vip<\/a><br>\r\n<a href=\\\"www.itseceu.uk\\\" style=color:#FFFFFF;>www.itseceu.uk<\/a><br><\/p>","protected":false},"excerpt":{"rendered":"<div class=\"mh-excerpt\"><p>Table of Contents SAM from Meta AI (Part 1): Segmentation with Prompts Segment Anything Training SAM Inference with SAM Project Structure Configuring Your Development Environment Need Help Configuring Your Development Environment? Creating Our Configuration File Implementing Visualization Functions Segmentation with\u2026<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/2023\/09\/11\/sam-from-meta-ai-part-1-segmentation-with-prompts\/\">SAM from Meta AI (Part 1): Segmentation with Prompts<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/pyimagesearch.com\/\">PyImageSearch<\/a>.<\/p>\n<\/div>","protected":false},"author":2021,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"slim_seo":{"title":"SAM from Meta AI (Part 1): Segmentation with Prompts - ITTeacherITFreelance.hk","description":"Table of Contents SAM from Meta AI (Part 1): Segmentation with Prompts Segment Anything Training SAM Inference with SAM Project Structure Configuring Your Devel"},"footnotes":""},"categories":[10700],"tags":[10734],"_links":{"self":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329280"}],"collection":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/users\/2021"}],"replies":[{"embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/comments?post=329280"}],"version-history":[{"count":1,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329280\/revisions"}],"predecessor-version":[{"id":329281,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/329280\/revisions\/329281"}],"wp:attachment":[{"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/media?parent=329280"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/categories?post=329280"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itteacheritfreelance.hk\/wordpress\/index.php\/wp-json\/wp\/v2\/tags?post=329280"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}