Statistics Homework Using R
Data Analysis Using Excel
For each of the following problems, save your work to a .r file. Name your files like
So my file for problem 2 would be Hendrix_Jeremy_HW3_2.r
Upload your five files to DropBox.
I have provided you with an Excel spreadsheet called Last_FM_data_shuffled.xlsx. It contains the log of all the music I have listened to on my phone since I began using the Last.fm website. As the name implies however, I have shuffled the entries so that they are no longer in chronological order. There is a header row at the top of the spreadsheet, and there are four columns of data: Band, Album, Song, and Date.
- Assuming you are not using packages that let you read from Excel, what must you do first in order to prepare this data to import to an R dataframe? What command will you use to import it?
For this problem, submit a .r file where the first line is a comment telling me what you have to do, and the second line is the R command to import the data. Remember that # is the comment character. - What is a single R command that can be used to count how many different bands are represented in the data file?
- Write an R script that will sort the data back into chronological order and store it in a new dataframe.
- Recall that the table() function can be used to quickly summarize data. As an example, assuming I have attached the dataframe with the song data, I can type
head(table(Song))
And get the following output
Song
(Song For My) Sugar Spun Sister 1901 45
2 1 2
50 Ways to Say Goodbye 6th Avenue Heartache 8:02:00 PM
1 2 1
Each song title appears as a column heading and the number underneath it represents the number of time the song appears in the Song column of the dataframe.
Using this, what is the R command to determine the name of the song that has been played the most times? What is the R command to determine how many times that song has been played?
- Using R, determine the average number of songs I listened to per day over the time period in the dataset.