Learn more about connecting to databases with R: https://www.datacamp.com/courses/importing-data-in-r-part-2
Welcome to part two of importing data in R! The previous course dealt with accessing data stored in flat files or excel files. In a professional setting, you'll also encounter data stored in relational databases. In this video, I'll briefly talk about what a relational database is and then I'll explain how you can connect to it. In the next video, I'll explain how you can import data from it!
So, what's a relational database? There's no better way to show this than with an example. Take this database, called company. It contains three tables, employees, products and sales.
Like a flat file, information is displayed in a table format. The employees table has 5 records and three fields, namely id, name and started_at. The id here serves as a unique key for each row or record. Next, the products table contains the details on four products. We're dealing with data from a telecom company that's selling both with and without a contract. Also here, each product has an identifier. Finally, there's the sales table. It lists what products were sold by who, when and for what price. Notice here that the ids in employee_id and product_id correspond to the ids that you can find in the employees and products table respectively. The third sale for example, was done by the employee with id 6, so Julie. She sold the product with id 9, so the Biz Unlimited contract. These relations make this database very powerful. You only store all necessary information once in nicely separated tables, but can connect the dots between different records to model what's happening.
How the data in a relational database is stored and shuffled around when you make adaptations, depends on the so-called database management system, or DBMS you're using. Open-source implementations such as MySQL, postgreSQL and SQLite are very popular, but there are also proprietary implementations such as Oracle Database and Microsoft SQL server. Practically all of these implementations use SQL, or sequel, as the language for querying and maintaining the database. SQL stands for Structured Query Language.
Depending on the type of database you want to connect to, you'll have to use different packages. Suppose the company database I introduced before is a MySQL database. This means you'll need the RMySQL package. For postgreSQL you'll need RpostgreSQL, for Oracle, you'll use ROracle and so on. How you interact with the database, so which R functions you use to access and manipulate the database, is specified in another R package called DBI. In more technical terms, DBI is an interface, and RMySQL is the implementation. Let's install the RMySQL package, which automatically installs the DBI package as well. Loading only the DBI package will be enough to get started.
The first step is creating a connection to the remote MySQL database. You do this with dbConnect(), as follows.
The first argument specifies the driver that you will use to connect to the MySQL database. It sure looks a bit strange, but the MySQL() function from the RMySQL package simply constructs a driver for us that dbConnect can use. Next, you have to specify the database name, where the database is hosted, through which port you want to connect, and finally the credentials to authenticate yourself. This is an actual database that we're hosting, so you can try these commands yourself!
The result of the dbConnect call, con, is a DBI connection object. You'll need to pass this object to whatever function you're using to interact with the database. Before we do that, let's get familiar with this connection object in the exercises!