Visualising the Indian-Startup Ecosystem


Visualization

Introduction

This visualisation is an exploration of the startup ecosystem in India. The data collected is over a period of 5 years from 2015 – 2020. It has aspects related to financing rounds of these startups and location and vertical based details. These is an in-depth data story that one can derive from the data points available, which are made clearer by the visualisation. The story I aimed to explore with this visualisation is that of sector wise trends, distribution of startups in different cities, trends of investment over time and how investment of one marquee investor are pegged against others.

Dataset

Investment Trends in Indian Startups

The dataset was from an open-data platform made available by Kaggle. The data-sheet did have quite a few parameters explicitly mentioned but some questions were left unanswered or upto self-discovery.

The data covers parameters like.

1.Startup name

2.Startup Funding round

3.Startup Funding amount

4.Investor

5.City where founded

6.Funding date

7.Verticals

8.Sub-Verticals

amongst others.The sheet needed cleaning and did not meet the principles of tidy data in its initial form, but was good to use after cleaning.

Tools -used

Tools that were used to make this visualisation were Excel, Tableau and Open Refine. The report was made and published using WordPress.

Process

The making of this visualisation required tools like OpenRefine, Excel and Tableau Public. Each tool served a separate purpose

Research

The first phase of my process involved researching for Open-data sources that could provide this data. This also involved using excel to figure the depth of the data and if it was clean and well tagged.

Open-Refine- Data Cleaning

I used Open Refine to clean the data. This involved clustering data for cells that contained similar kind of data but seemed like different datapoints due to spelling error. I also removed unwanted link text that was present in some cells. Each column had individual attributes hence did not require any splitting. Post cleaning I exported the clean sheet as a csv.

Tableau- Data Representation

The visualisations were done in Tableau Public. I imported the sheet cleaned in Open-Refine and used that as a data-source. I also created a few new measures and Dimensions which I used in my Visualisation. The newly created measures were as follows

– COUNT([Vertical]) This was to have a count of cells which had the same Vertical of business.

– COUNT([Investors]) This was to have a count of cells which had similar investors.

– IF CONTAINS([Investors],”Accel”) THEN ([Vertical]) END This was to have a list of verticals Accel invests in.
Post creation of these I moved on to creating my visualisations.

Visualisations & Observations

1. Tree Chart

Vertical wise investment

For the Vertical representation I used a tree chart with vertical and amount of investment as attributes for population. The idea was to visualise the verticals which have seen the most amount of money being invested in them.

2. Area Chart

Year wise round-type trend

For time bound raise stage I used an area chart to represent the kind of investment round types that saw most amount of money being invested over time. I used Date, Amount and Round type to build the visualisation. The area under the graph can be used to understand the total amount over years that has been invested in a certain round.

3. Packed Bubbles

City wise startup spread

For city wise startup count I used packed bubbles with startup count and city as the attributes to represent this chart. I wanted to visualise the city that has the most number of startup companies based out of them.

4.Time Series

Time based vertical wise analysis

For vertical with most investments over time I used vertical count that I had created, date and vertical name to create this chart. I wanted to visualise the vertical wise investment trend over time.

5.Text Table

Accel vertical competitors

For the Accel vertical analysis I used the created attribute of verticals Accel invests in, and the investor dimension to plot this chart. I wanted to visualise which verticals that Accel invests in sees a lot of interest from other competitor investors

Reflection and Critique

Limitations

The data cleaning exercise with Open-Refine although helpful was a little limiting as well. It still took a lot of manual comb-through effort to remove unwanted links. Given the size of data I wish there was a more formula based approach to clean data especially the cells that require manual intervention.

Positives

Tableau is a very detailed tool in terms of visualisations. It helps to build custom charts basis the story one wished to tell or data that one wishes to derive. It is highly customisable and requires minimal coding, especially for the basic charts that one may wish to build. The visualisation also helped me understand larger market trends through them which would have been missed if one merely just combed through the data. One notable thing about the time-bound visualisations is the slump in investments with the onset of Covid.

Peer Critique and changes

My final chart of analysis of verticals Accel invested in was visually heavy with different colours and also had Null return plotted as well. As per suggestion I made the colour uniform and removed Null values from the chart which was making the chart redundant.

Bibliography

Investment Trends in Indian Startups. (n.d.). Retrieved February 21, 2023, from https://www.kaggle.com/datasets/thedevastator/investment-trends-in-indian-startups

Indian Startup Ecosystem. (n.d.). Tableau Software. Retrieved February 21, 2023, from https://public.tableau.com/views/IndianStartupEcosystem_16769449138430/Summariseddashboard%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Aembed=true&%3Alanguage=enUS&publish=yes&:embed=y&:showVizHome=n&:apiID=host0#navType=0&navSrc=Parse&1