6 Database Use

We can read in a file from our directory and hence input data into a dataframe within R.

df = read.csv(file="datafile.csv")

Let’s look at what we got:

head(df)  # displays only the first few rows, if the file is large
##     Name   x   y   p   q Rhat
## 1  First   3   5   1   6    7
## 2 Second  12  53  34  65   65
## 3  Third 143 643 376 762  191

Let’s define a new column, z, where \(z=\sqrt{x^2+y^2}\) for each row of our data:

df$z = sqrt(df$x^2 + df$y^2)
head(df)
##     Name   x   y   p   q Rhat          z
## 1  First   3   5   1   6    7   5.830952
## 2 Second  12  53  34  65   65  54.341513
## 3  Third 143 643 376 762  191 658.709344

We can now sort and display various members of our dataframe:

nrow(df)  # count number of   rows  in the dataframe
## [1] 3
ncol(df)  # count number of columns in the dataframe
## [1] 7
df[df$x>10,]
##     Name   x   y   p   q Rhat         z
## 2 Second  12  53  34  65   65  54.34151
## 3  Third 143 643 376 762  191 658.70934
df[df$z<60,]
##     Name  x  y  p  q Rhat         z
## 1  First  3  5  1  6    7  5.830952
## 2 Second 12 53 34 65   65 54.341513
df[df$x>10 & df$z<60,]
##     Name  x  y  p  q Rhat        z
## 2 Second 12 53 34 65   65 54.34151

From our dataframe which has 3 individual line entries, the only line which has \(x>10\) as well as \(z<60\) is the one with Name = Second.