We have done a project using R (the coding language) for first time and RStudio, an environment for R. The data that we have collected is about pollutants and some wind properties.
Process and code, DATA SECTION:
In this section you will see all the data and documents that we will be using to create the graphics.
First of all, I had to link R and RStudio to get into the main interface. Next, I had to download three libraries that will be required to do the following steps. This is the code to install the libraries:
Or if It doesn't work:
Secondly, what we have to do is to download the data that we will be using and organize that code. I download the data from here: Pollutant data and Wind data (these webs are in my native language). The data has to be in the .csv format (you can see It in the properties of the document).
Create a new project in RStudio.Here you can see the code, city is the name of the document and what is in brackets is the location of the document:
We delete T 00.00.00.000 and replace the hours h01, etc by 01:00:00 until obtaining the dates in format ISO:
We combine date and time together in city4.csv using LibreOffice Calc, RStudio or Visual Code Studio:
city4 <- city3 %>% mutate(name=paste0(data, " ", hour))
We create city5 by joining day and time under the name of date.
We convert the data format:
city5PM10 <- subset(city5, pollutant=="PM10")
city5PM10$date<-as.POSIXct(city5PM10$date,"%Y-%m-%d %H:%M:%S", tz="Europe/Madrid")
Care must be taken that the date is not a set of characters (character) but a POSIX, a date.
Sorting data with pivot_wider:
city6<-pivot_wider(city5, names_from= pollutant, values_from =value)
city6$date<- as.POSIXct(city6$date,format="%Y-%m-%d %H:%M:%S",tz="Europe/Madrid")
daily<-timeAverage(city5PM10,avg.time = "day")
yearly<-timeAverage(city5PM10,avg.time = "year")
yearlyall<-timeAverage(city5,avg.time = "year")
episode<-selectRunning(city6, pollutant="O3",threshold=120, run.len=8)
R tells us that the level of 120 micrograms / cubic meter has been exceeded in the average of 8h a total of 95 times.
In my case:
wind2<-pivot_wider(wind1,names_from = CODI_VARIABLE, values_from = VALOR_LECTURA)
names(wind2)[names(wind2) == "31"] <- "wd"
names(wind2)[names(wind2) == "30"] <- "ws"
names(wind2)[names(wind2) == "DATA_LECTURA"] <- "date"
We eliminate the X column in libreoffice, It's created wind3.
Process and code, GRAPHIC SECTION:
trendLevel(city5PM10, pollutant = "value", main="Hydrogen sulfide evolution in MYCITYNAME")
calendarPlot(city%NO2, pollutant="value", year="2020")
Calendar Plot 2:
calendarPlot(city%PM10, pollutant="value", year="2020")
Temporary variation of pollutants 1991-2021:
timeVariation(city6, pollutant=c("O3","NO2","H2S","NO","HCNM","CO","SO2","HCT", "NOX","PM10"), main="Air pollution in MYCITYNAME (1991-2021)")
timePlot(selectByDate(yearlyall), pollutant = c("NO2", "NOX", "SO2", "NO", "O3", "PM10"), y.relation = "free", main="Yearly mean of air pollutants in Sant Andreu de la Barca")
Time Plot 2:
timePlot(selectByDate(city6), pollutant = c("NO2", "NOX", "SO2", "NO", "O3", "PM10"), y.relation = "free", main="Yearly mean of air pollutants in Sant Andreu de la Barca")
This code is a bit special so i will put It here, so you can follow all the steps:
toDelete <- seq(2, nrow(wind3), 2)
wind5<-wind3[ toDelete ,]
cityall<-merge(city6, wind5, by = "date")
pollutionRose(cityall, pollutant = "O3")
Pollution Rose 2:
pollutionRose(cityall, pollutant = "NOX")
With all the data that we have collected we could see If there are some values that exceed the legal limit of pollution and with the wind graphic (Pollution Rose) we could see where does the pollution come from.
The graphics are a good way to detect that high values of pollution and I think that this has been a good introduction into R language and a good way to learn basic code with RStudio as It has been very practic to work for the time with a large number of data.