[vimeo 195745074 w=640 h=400]
This video demonstrates the process of formatting a .csv file, with data in it, for proper insert into a MySql database.
The tools needed to carry out this process ars:
- A free copy of libreoffice (we will be using “Calc”).
- A free copy of MySQL Workbench.
- Your favorite text editor(optional)
- A .csv file with data in it (preferably with column headings).
- A little bit of patience!
Note on Calc:
Calc is the preferred tool for this method for a couple a reasons. The first is because Calc will allow the user to enclose data fields with text, or data that won’t have a calculation run on it, in single quotes or double quotes. This is crucial for a database like MySQL because unless text, or strings, are enclosed in some sort of quotation marks the database won’t recognise the data. If your data is all numbers or “Integers”, quotes wont be an issue. Additionally double quoted text won’t run into an issue if a single quote is used as an apostrophe. Another benefit to having all text within double quotes is cells with no value can be easily made into null values once entered into the database table.
Let’s jump in.
First an .csv file is needed with some data and column headers in the first row. The data used in the video is scraped data from the Boston Museum of Fine Arts website. The data should be relatively clean in that columns should have consistent data in them ie. if there’ s a column with prices, all the data in the column should be a number, without a dollar sign. If that price column represents the ‘cents’ portion of a price then decide how many decimal places you want stores (with ‘cents’ it will always be 2). In other words data within rows should be consistent.
Once the data is consistent and the way we want it, we should add a column to the left of our first column.
Next save the .csv file to your desired location, and check the edit filter settings box.
Here we can set how each column is separated and what to enclose text or strings in. We want to enclose strings in double quotes (“”) and delineate fields by comma (,).
At this point look at your data, and focus on the cells where there is no data. Why is there no data? Is that data point unknown meaning, could it be know later? Is it empty for a reason and will never be filled? If those cells will never be filled, and are empty for a reason, find and replace those empty fields with some text like UNKNOWN or EMPTY. If these fields may be know at a later date keep them empty for now.
Now that we have a .csv file let’s start the import to a newly created MySQL database. If you haven’t downloaded MySQL workbench do so from here, and follow the instructions to install.
Note: If it’s your first time installing MySQL Workbench, do not forget your root username or password. This is very important. If you intend on changing the given password write it down anyway.
Once Workbench is up and running let’s create a database and table to insert the data into.
This is our workspace. The first query will be:
(Pressing the yellow lightning bolt will run the query). The text `DHtestdb` could be anything, but keep the name inside of backticks (` `). Backticks deactivate SQLs keywords, allowing the user to use any words even if sql deems them “special”.
Now that we have a database made, let’s create a table inside the database:
Notes about the table:
First we create an id for each record, making it easier to query and update the table. The rest of the columns correspond with our .csv files headers. VARCHAR denotes the data type for each column. VARCHAR is a string of various characters, numbers or letters.
Now we have an empty table in our database to insert the data from our .csv file.
Our empty table should look like this:
Now let’s start the process of importing the .csv.
Click the arrow to the left of the database you created so that the drop down “tables” drops down. Then hit the tables arrow to reveal the table you just created.
Right click on the table and select “Import Data Wizard”.
Navigate to the previously created .csv and click next. Then click the radio button for use existing table and use the table that was just created. Workbench will then read the headers of the .csv file and match them with the column names of the table just created.
Unclick match source column and make sure on your own that the headers and columns match. Click next and the data will start to be entered into the table.
Now we want to clean up the data a little bit more and change those empty records to nulls.
Next run a query like this for the dynastic period column:
This query will change all of the records in the dynastic period column that has nothing in it to a null value.
Now we have our data in a querys that sort our data in ways like such:
These are some of the ways data can be parsed with with good good amount of control thanks to the structure of a MySQL database.