Introduction
I have come across the data which revolves around agriculture sector in India. As India is primarily involved in Agriculture, I thought it would be interesting to study the data across this sector.
The set of questions I had when I was going through the data was,
1. Which States in India are contributing maximum in agricultural production?
2. What are the top 5 highest yields in India and why those contribute more ?
this arise another sub question of studying crop conditions of high yield crops in India so that we can understand which agricultural trends or conditions benefit maximum and continue doing that and start to work on those parameters in other subsequent States as well.
Dataset
For this data visualization, I gathered raw data from Kaggle in the CSV file format. The dataset was named as “Production about India” The data has years as time factor. The readings given were of production of various crops in all different states and union territories across India. It was further breakdown in crop type as well. The original data had 13 different columns of quantitative as well as qualitative data and 99850 observation rows. I further imported that data in R Software for Data Visualization work on the visualization.
Process
I started to understand R, as that was the platform I have chosen to work on with my Data. Once I understood the basics of R, I worked on the data to create various subsets of data by using filter, select codes as well as by using pipes.
Once this filtering was done, I proceed to create graphs/diagrams to understand various agricultural trends in India across different states. I used ggplot to create all my graphs, various alterations in terms of size, color, diagram type, labeling was done through it.
Through this process, I was trying to concentrate on the various parameters to consider but my peers gave me suggestions to work on presentation aspect as well, that’s when I tried to play with size and color of the elements.
Result
As I started analyzing the data, I opted to start with state wise data. I wanted to first understand, which states perform better in terms of agricultural and then what is crop that they yield the most. Moving ahead, I wanted to understand the crop condition of Highest yield crop so that in totally we can understand which Crop type, Conditions and which states are profitable when it comes to agricultural. Once we understand and present this study, It will be easier to farmers in India to work on those aspects to have good yield.
The following graph shows various crop yields across India. We can clearly see that Rice is the highest yield crop in India, followed by wheat, potato, cotton and bananas.
The following graph represents the Production in tons (all crops combined) in individual states.
The following graph represents the agricultural land area in individual states.
Finally, tried to combine both the diagrams in one graph using facet wraps. The following visualization shows area under agricultural and production in tons. The states which seems leading in agricultural have lesser land area but highest production. Panjab becomes the highest producing state and Gujrat, Haryana and west Bengal are subsequent in line.
As Rice is the highest cultivated crop in India further analyzing the crop would give clear insights. For the same I have prepared a point geometry to understand the Average Temperature and Rainfall across various states. The dots are color coded according to temperature and sized according to average rainfall of each state.
The Rice cultivation is conditions are favorable in Sikkim, Arunachal Pradesh, Goa and Andaman and Nicobar Islands.
Reflections
I think the overall learning of R and data visualization worked well for this particular project. More experimentation in terms of presentation such as colors, customizing certain details etc. would have elevated the entire visualization.
References
“Kaggle: Your Machine Learning and Data Science Community.” Accessed October 3, 2023. https://www.kaggle.com/datasets/imdevskp/cholera-dataset