class: center, middle, inverse, title-slide .title[ # Computer Vision System for Automated Medicinal Plant Identification ] .subtitle[ ## NUMBATs’ TALK ] .author[ ### Jayani P.G. Lakshika ] .institute[ ### Monash University, Australia ] .date[ ### October 14, 2022 ] --- class: center, middle ## My SUPERVISOR <img src="madam.jpg" width="30%" style="display: block; margin: auto;" /> Dr. Thiyanga S. Talagala Senior Lecturer in the Department of Statistics, Faculty of Applied Sciences at the University of Sri Jayewardenepura PhD in Statistics, Monash University, Australia https://thiyanga.netlify.app/ --- background-image: url("bg1.png") background-size: 900px background-position: 90% 8% --- ## Main objective - **.red[To develop an automatic algorithm to classify medicinal plants by using statistical machine learning approach]** <img src="pp.png" width="30%" style="display: block; margin: auto;" /> --- # Limitations **Why medicinal plant leaves?** -- - Leaf images are considered as they contain large number of diverse set of features such as **.orange[shape, veins, edge features, apices, etc]**. <img src="simple_leaf_parts.png" style="width: 40%" /><img src="full.png" style="width: 40%" /> -- - We used **.red[non-diseased leaves with simple arrangement]**. -- - We used the leaves **.red[without a petiole.]** --- # Significance of the study - **.red[To avoid misidentifying medicinal plants in Sri Lanka]** - The algorithm developed by us is based on the leaf images. Since leaves are **.red[relatively easy to obtain without damaging the plants]**, there is no harm for the plants because of the development of algorithm. - Our algorithm works as a hierarchical classification system. Therefore even though we don't know the exact species name, we can follow the first 2 levels. As the result of that **.red[misidentification rate and computation time will be decreased]**. <img src="diagram.png" width="60%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Methodology  --- ## Workflow <img src="algo_new.png" width="50%" style="display: block; margin: auto;" /> --- ## Image Acquisition - A database of leaf images of medicinal plants in Sri Lanka **.orange[is not yet available]**. - **.green[Establish a repository of medicinal plant images]**. - Preliminary study by using **.red[471 medicinal plants]** and recorded their characteristics like **.purple[leaf arrangement, shape, edge type etc]**. <img src="pic.png" style="width: 80%" /> --- - Collected **.red[1099]** leaf images from **.red[31 species]** <img src="actual_image_collection_process.png" style="width: 100%" /> **MedLEA: Medicinal LEAf** <img src="MedLEA.png" width="55%" style="display: block; margin: auto;" /> repository is made available to the public through an **.red[open-source R software MedLEA]**, available at url(https://CRAN.R-project.org/package=MedLEA) for research reproducibility **.red[Total downloads: More than 1000]** --- <img src="ima.png" style="width: 100%" /> <img src="datasets.png" style="width: 60%" /> --- ## Methodology Diagram <img src="ov.png" width="110%" height="110%" style="display: block; margin: auto;" /> --- # Image processing - The image processing receives an image as input and generates a **.green[modified image]** as an output which is suitable for better **.orange[morphological analysis, feature extraction]**. - Image processing is an essential step to **.red[reduce noise, background subtraction and content enhancement]** in the identification process. <img src="image_processing.png" style="width: 100%" /> -- <img src="close_holes.png" style="width: 30%" /> <img src="remove_stalk.png" style="width: 30%" /> --- ## Methodology Diagram <img src="ov.png" width="110%" height="110%" style="display: block; margin: auto;" /> --- # Why feature extraction is important? - Recently, many researchers use deep learning methods like CNN (Convolution Neural Network) to classify plants - directly using plant images. - Even though deep learning models have achieved great success, their interpretability, and transparency of the deep learning models are limited. <img src="CNN.png" width="110%" height="110%" style="display: block; margin: auto;" /> --- # Features - In identification of plant species by using leaf images, **.green[features of the leaves play a main role]**. -- - In previous research, let the algorithm like **.red[CNN]** to extract features by itself and do the classification. -- - Therefore it is so **.purple[hard to interpret and generalize]** the features. -- - We introduced **.red[pre-calculate features]** which can be **.green[easy to interpret and generalize]**. They are also **.green[computational efficient]**. -- <img src="feature_hie.png" width="80%" style="display: block; margin: auto;" /> - We identified altogether **.orange[52 features]**. --- ### Shape features <img src="shape.png" width="80%" style="display: block; margin: auto;" /> **New shape features: Correlation of cartesian coordinate, number of convex points, number of minimum and maximum points** <img src="mn.png" width="45%" style="display: block; margin: auto;" /> --- class: center, middle **Diameter calculation** <img src="d_cal.png" style="width: 68%" /> --- ### Color features <img src="leaf_img.png" width="50%" style="display: block; margin: auto;" /> <img src="col_eq.png" width="60%" style="display: block; margin: auto;" /> --- ### Texture features <img src="text_tb.png" width="90%" style="display: block; margin: auto;" /> --- <img src="t1.png" width="50%" style="display: block; margin: auto;" /> <img src="t2.png" width="50%" style="display: block; margin: auto;" /> --- ### Scagnostic features <img src="scp.png" width="70%" style="display: block; margin: auto;" /> --- <img src="sca.png" width="60%" style="display: block; margin: auto;" /> --- ## Visualization of Leaf Images in the Feature Space Example: Flavia - LDA is a supervised dimensionality reduction technique, and PCA is unsupervised dimensionality reduction technique - The first **.green[three principal components (PCs)]** accounting for approximately **.red[83%]** of the total variance in the original data <img src="pcaflavia.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Let's see the 3D View --- <iframe width="560" height="315" src="https://www.youtube.com/embed/OGcDImFfqGM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- <img src="ldaflavia.png" width="100%" style="display: block; margin: auto;" /> - Under both experimental settings class separation is **.orange[more clearly on the LDA space than the PCA space]**. The reason could be LDA is a supervised learning algorithm while PCA space is an unsupervised learning algorithm. ---  --- class: inverse, center, middle # Algorithm Development  --- - Our medicinal plant classification algorithm contains **.green[two process: Training process and Test process]**. - Our classification algorithm operates on the **.purple[features extracted from the image leaves]**. - The training process e of the algorithm contains four main steps: **.orange[i) Image processing, ii) Feature extraction, iii) Label images, and iv) Trained a algorithm]**. - In the test process, image processing and feature extraction steps are followed by the **.orange[new image before feed to the pre-trained model]**. - Mainly **.red[Random Forest, Gradient Boosting and Extreme Gradient Boosting]** classification algorithms are used in our research. <img src="algo.png" width="70%" style="display: block; margin: auto;" /> --- ### MEDIPI: MEDIcinal Plant Identification <img src="sticker1.png" width="20%" style="display: block; margin: auto;" /> Our medicinal plant classification algorithm is defined as **.red[MEDIPI]**. <img src="algo_new_1.png" width="60%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Discussion & Conclusions  --- # Hierarchical Approach - Our algorithm works as a **.red[hierarchical classification system]**. The hierarchy contains **.purple[3 levels]**. The first level classifies images according to the **.red[shape]**. The second level classifies according to the **.red[edge types]**. The bottom level classifies the **.red[plant species]**. <img src="diagram.png" style="width: 90%" /> --- ## Hierarchy of Actual leaf image dataset <img src="Classification_hierarchy_new.png" style="width: 90%" /> --- ## Hierarchy of Flavia leaf image dataset <img src="hie.png" style="width: 90%" /> --- class: center, middle ### Experiments <img src="exp.png" style="width: 100%" /> We have to use **.orange[training/test from same dataset]** to get accurate results. --- ## Compare results features of all categories and only with shape features Training and test datasets from same dataset <img src="comp1.png" width="80%" style="display: block; margin: auto;" /> --- ## Compare results features of all categories and only with shape features Training and test datasets from different datasets <img src="comp2.png" width="70%" style="display: block; margin: auto;" /> --- # Compare algorithms <img src="tb11.png" width="80%" style="display: block; margin: auto;" /> The model trained with **.red[Random Forest]** algorithm provides the highest accuracy. --- ## Linear Discriminant Analysis - High dimensional visualization approach - To visualize what is **.red[happening inside]** the trained algorithm and provides **.red[transparency]** to our black-box model <img src="actp7.png" style="width: 90%" /> --- <img src="actp10.png" style="width: 110%" /> --- <img src="actp11.png" style="width: 110%" /> --- <img src="actp12.png" style="width: 110%" /> --- <img src="actp13.png" style="width: 110%" /> --- <img src="actp14.png" style="width: 110%" /> --- <img src="actp9.png" style="width: 110%" /> --- ## Conclusions - The model trained with **.red[random forest]** algorithm provides the highest accuracy. -- - Our algorithm works as a **.red[hierarchical classification system]**. -- - We observe that **.red[shape features]** like (i) x value of Center (cx), (ii) y value of Center (cy), (iii) Entropy, (iv) Perimeter ratio of length and width, (v) Diameter, (vi) Area convexity, (vii) Perimeter convexity, (viii) Narrow Factor, (ix) Area ratio convexity, (x) Physiological length, (xi) Physiological width, (xii) Rectangularity, and (xiii) Eccentricity are more important when classify the leaf images in the **.red[first level]** of the hierarchy. - **.red[Scagnostic features]** like (i) Monotonic contour, (ii) Convex polar, (iii) Convex contour, (iv) Striated polar, (v) Striated contour, (vii) Skinny contour, and (vii) Skinny contour are more important in identifying leaf species in the **.red[bottom level]** of the hierarchy. -- - The **.red[MEDIPI]** algorithm yields accurate results to the state-of-the existing techniques in the field. -- - We have to use **.red[training/test from same dataset]** to get accurate results. -- - We observe that **.red[shape feature is not sufficient]** to classify leaf images. --- class: inverse, center, middle # Thesis Outcome  --- class: center, middle ## MedLEA <img src="git_sc.png" width="80%" style="display: block; margin: auto;" /> https://CRAN.R-project.org/package=MedLEA --- class: center, middle background-color: #c2a5cf ## Research paper <img src="researchpaper_new.png" width="105%" style="display: block; margin: auto;" /> https://arxiv.org/abs/2106.08077 --- ## Web Application for Leaf Image Identification <img src="shinyapp.png" width="70%" style="display: block; margin: auto;" /> --- <img src="octave.jpg" width="80%" style="display: block; margin: auto;" /> --- background-color: #bdbdbd # Applied Statistics Conference 2021 (Solvenia) <img src="Poster.png" width="75%" style="display: block; margin: auto;" /> --- background-color: #bdbdbd # Young Scientists' Conference on Multidisciplinary Research (YSCMR 2021) organized by National Institute of Fundamental Studies (NIFS) in Sri Lanka <img src="ycmr.png" width="75%" style="display: block; margin: auto;" /> --- # Other talks 1) **Estadistica 2021** – the Annual Statistics Day organized by the Statistics Society of University of Sri Jayewardenepura, Sri Lanka <img src="edistica_talk.png" width="40%" style="display: block; margin: auto;" /> 2) **Vx Tech Talk 2021**, Vizuamatix Pvt. Ltd, Sri lanka --- # Further Research - Develop algorithms to identify plant disease in Sri Lanka -- - Expand the species collection -- - Explore differences in plant features according to spatial distributions and climate conditions (For example, Gotukola leaf in Colombo is smaller than in Anuradhapura) -- - Develop an algorithm to handle images with heterogeneous backgrounds --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). .pull-right[.pull-down[ <a href="https://jayanilakshika.netlify.app/"> .white[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> https://jayanilakshika.netlify.app/] </a> <a href="https://twitter.com/home"> .white[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @LakshikaJayani] </a> <a href="https://github.com/JayaniLakshika"> .white[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @JayaniLakshika] </a> <br><br><br> ]]