Seeds, having 15 to 20% contribution in the agricultural output, are what form one of the basic determinants of the quality of the harvest, in addition to factors like soil quality, irrigation, fertilizers and insecticides used, etc. So it is best to get the good quality of seeds, for that our AI predicts the quality of seeds in a few seconds.
Agriculture and its allied sectors account for 15.87% of total India’s GVA (Gross Value Added) at $375.61 billion and 7.39 percent of total global agricultural output, which lags far behind China whose total global agricultural output is 19.49%. While the world’s average in the agricultural contribution is 6.4%, for India it stands at more than 15 %, clearly, placing agriculture as one of the crucial factors influencing the GDP of India. Despite having about 50% of 1.7 billion people engaged in agricultural practices, India lags behind China. Why so?
In India, most of the farmers have received little to no education. Owing to which they are easily fooled and manipulated by private players who use a few momentous technical terms and confuse them into buying low quality seeds which later on affect the crop yield.
Seeds are the heart of agro-economic and horticultural crops. Characteristics such as genetic factors, germination percentage, appearance, vigor, and purity play a crucial role in crop yield.
Let us have a look at some of these factors in order to have a better understanding of the key elements that determine the seed quality and what can be done to make this detection easier for the farmers as it isn’t possible to test each and every seed before planting them. Application of AI in farming are tremendous and can change the future of Agriculture especially in India.
Aspects of seed Quality:
Genetic Factors :
According to Jack Harlan, an agronomist and botanist, (genetic diversity in crops) “stands between us and catastrophic starvation on a scale we cannot imagine”. We have lost thousands of hectares of agricultural land to urbanization while the population keeps growing at exponential rates.
Only 11 percent, that is, out of 13.4 billion hectares, only 1.2 billion hectares of land is used for crop production but in order to feed a population of 7 billion, we need to grow more food but we don’t have enough land. So what do we do now?
The answer is to increase the crop production per unit area of the land available. Owing to the
Green Revolution in the 1960s wheat production shot up from 11 million tonnes in 1960s to
55 million tonnes in the 1990s. That’s how much genetics affect the crops.
Today we have lots of genetically modified crops available. For example, Corn, modified to protect themselves from rootworms and Asian corn borers; Soybeans, modified for pest resistance and herbicide tolerance; Cotton, again modifies to develop resistance against cotton bollworm; etc.
Certain traits are carried by recessive alleles and upon breeding, if paired with a dominant allele for the same trait, the recessive allele cannot express itself or in some cases partially expresses itself thereby leading to hybridization. Hybridization over several generations may lead to loss of traits originally found in pure verities. Hence, hybridization should be carried out in a composed way. Allowing different species to pollinate on their own won’t necessarily result in a better breed of crops.
So, we have genetically modified seeds available which should be used for agricultural purposes for better yield.
Germination Percentage :
It is the percentage of seeds that emerged as plants out of all the seeds planted, i.e. :
Several internal and external factors are responsible for the determination of germination.
- Water: a dormant seed contains 6 to 15 percent water while for carrying out germination and other metabolic activities, it requires 75 to 95 5 water. Hence, it needs to absorb water from the surrounding.
- Oxygen: oxygen is necessary for respiration that is, breaking down of glucose into carbon dioxide and energy. This energy is used for carrying out the metabolic processes necessary for growth.
- Temperature: Germination can take place between 5 to 40 degree Celsius while it stops at 0 or 45 degrees.
Abscisic acid is a plant hormone responsible for maintaining seed dormancy.
Gibberellins (GAs) are another plant hormone that breaks seed dormancy and assist in germination. Along with GAs, several other hormones such as ethylene, cytokines, and brassinosteroids also promote seed germination.
- Appearance and Purity:
Appearance is also an important factor in determining the quality of seeds. We have taken into account this factor and developed a deep learning model to predict the quality of seeds that go into farming.
How does our model work?
We have also developed an app for this project. So first off, the farmer needs to click a picture of the seeds and upload it using our app, which acts as the input. Our model will analyze each and every gain in the picture will decide the quality of seeds considering their appearance and classify them as:
- Grain: contains healthy seeds.
- Damaged grains: contains unhealthy or deformed grains.
- Foreign: particles other than wheat grains.
- Broken grains: As the name suggests, contains broken grains.
- Grain cover: contains the seed coats only.
To know more about project and dataset. You are free to contact us on [email protected]
These classes act as output.
The dataset used for our machine learning model for seed quality prediction are pictures of wheat grains. To which we applied Edge detection for data extraction and image segmentation. This image processing technique works by detecting the continuity in the boundary of the image.
Having the data extracted, in order to fill the connected components with connected color, we used flood fill function().
Syntax : cv2.floodFill(image, mask, seedPoint, newVal[, loDiff[, upDiff[, flags]]]) → retval, rect.
The problem we faced was that we had a large dataset with a large number of variables. Hence, to reduce its dimensionality we used the Principal Component Analysis algorithm. Though this algorithm comes at the cost of accuracy, yet, to analyze and explore smaller data sets is easier and faster for the machine learning algorithm.
The first step in the process is to standardize the data so that each value falls within the same scale. If the data is not standardized then suppose if one of the parameters has larger numerical values, then the result will be dominated by that parameter and the other ones will be simply ignored.
It is done using the following equation:
Next, we calculate the covariance matrix for the dataset to find the relation between the input values so that redundancy of information can be reduced in case two or more dataset share high covariance.
Then by computing the Eigenvectors and Eigenvalues for the covariance, PCA can be calculated.
(You can leave a comment if you need the dataset for building a model of your own, we will contact you.)
The dataset had many random variables with zero mean hence to remove these noisy values from the colored images we used fastNIMeansDenoisingColored () function. It takes the following parameters:
- Src : the input image
- Dst : output image with same size and type as that of the input image.
- templateWindowSize : size of template patch (in pixels) used to compute weights.
- searchWindowSize : size of window used to compute the weighted average for the given pixels.
- H : regulates filter strength for luminescence components.
- Hforcolorcomponents : same as h but used for colored components.
cv2.fastNlMeansDenoisingColored(src[, dst[, h[, hColor[, templateWindowSize[, searchWindowSize]]]]]) → dst
We have used built classifiers using python for our machine learning model for predicting seed quality:
- Our first classifier divides the dataset into the above mentioned five classes.
- Our second classifier categorizes the dataset as grain and not grain. Where grain contains Grain class and not grain contains the other four classes (damaged grain, foreign, grain cover, broken grain).
The Model that predicts the quality of seeds :
We have trained our model on the dataset of wheat seeds to identify and classify them as grains or not grains.
We start building our model for seed quality prediction by importing necessary libraries and classes.
In Keras, the model can be built using two methods; firstly by sequential model, by stacking up layers one upon the other while the other method is to use Model (functional API) to create models. As the former method is easier hence preferred.
Dense is used to describe a neural network in which each node is connected to every other node in the next layer, i.e. a fully connected neuron.
Dropout is used to prevent our network from becoming fully fitting, which, otherwise would cause our model to generalise things and cause errors when introduced to new dataset.
We define a function make_model which takes two parameters input_shape and output_shape which are integers. As mentioned before, we will be making a sequential model hence we call sequential () function.
Next we use add () function to add layers. The first layer tells our model about the size of the input layer hence it contains the parameter input_dim. Here, 64 represents the number of neurons in the input layer whereas in the next line, 32 represents the number of nodes in the hidden layer. An activation function is also applied in the model in each layer. Activation is applied inorder to introduce non linearity in the model such that the input signal in the neuron ca be converted into the output signal. Without the activation, the output would be a linear function and that’s not powerful enough to learn complex function mapping from the input data.
Here we have applied “ReLu” or Rectifier Linear Unit activation to the sum of the product of all the inputs and their associated weights. We use ReLu as it has a better convergence and avoids vanishing gradient problem as it doesn’t allow a small gradient, i.e., it makes our neuron more efficient.
What is a vanishing gradient?
Using activation function like sigmoid yields result in the range 0 to 1, that is it squeezes a large input space into a small input space such that the output for a large change in the input is small and hence the derivative becomes smaller and smaller and as the gradient loss function approaches zero, it becomes harder to train the model. This is called vanishing gradient problem.
However, there is a limitation with ReLu that it cannot be applied to the output layer and since our network forms a classification model, we use softmax activation in order to calculate the probability of falling into a certain class. If it were a regression problem, we would use linear function.
Finally, we compile the model.
compile (optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Here, we have compiled our model using Adam (Adaptive Moment estimation) optimiser .It is used in place of classical stochastic gradient descent algorithm, meaning, we apply the same learning rate for all the weight updates and this rate doesn’t change during the model training period. Here, we have used a learning rate of 0.001.
Loss is a numerical values indicating how bad a model’s prediction was. If the prediction turned out to be true, the loss is 0.
Catogrical crossentropy is used when the sample could belong to only one of the given classes. It is represented as:
Where y’ is the predicted value. The output we get is one hot encoded, i.e., for the true class value, the output is 1 whereas for others it is zero.
Matrices typically take in “accuracy”, as is the case without model.
Here, we have finished building our model. Further we train the batch size of a hundred samples for 2000 epochs.
Furthermore, filters are applied for better accuracy.
Our model classifies the seeds as:
|Percentage of healthy seeds :||Sample Classified as :|
|Above 80||Excellent Quality|
|Below 20||Very Poor quality|
Hence, it will prove to be a great aid to the farmers. Good quality seeds would ensure better yield and also help in deciding a fair price for the seeds. The farmers won’t be manipulated and tricked into buying poor quality seeds as the samples can be easily studied for a quality check using our technique for AI to predicts the quality of seed sand since the app is multi-lingual, they won’t face any linguistic barriers as well.
Developed by Chirag Singla.
Content is written by Durgesh Nandini