Golang: Extract data from an RSS feed
Web agency » Digital news » Golang: Extract data from an RSS feed

Golang: Extract data from an RSS feed

For those who come to view the syntax of Go, know that despite its confusing syntax at first glance, it is nonetheless easy to understand and easy to understand after 2/3 hours of practice. For my part, at first I simply discovered the Go language (Golang) via my latest love… Docker! Subsequently, I went to see a meetup dealing with this subject and after working 1 hour on a project now deep in my GitLab, I started working on two projects, including performance and maintainability issues. of the code are key points of a viable project.

For the writing of this post, I extracted an issue from one of my projects. Namely that the use of the language is in my opinion a great asset, because this recent language is extremely easy to compile and facilitates the use of external libraries... When necessary because Go has a variety of packages well felt. In this case here, the package encoding/xml.

The following example demonstrates extracting data from a RSS but this method can be easily adapted to extract a sitemap or any other XML properly constructed.

Demonstration

To illustrate my work, I took the flow RSS given as an example on the Wikipedia page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?xmlversion=”1.0” encoding=”UTF-8"?>
<rss version="2.0">
<channel>
<title>My website</title>
<Description>This is an example of an RSS 2.0 feed</Description>
<lastBuildDate>Sat, 07 Sep 2002 00:00:01 GMT</lastBuildDate>
<link>http://www.example.org</link>
<item>
<title>News No. 1</title>
<Description>This is my first news</Description>
<pubDate>Sat, 07 Sep 2002 00:00:01 GMT</pubDate>
<link>http://www.example.org/actu1</link>
</item>
<item>
<title>News No. 2</title>
<Description>This is my second news</Description>
<pubDate>Sat, 07 Sep 2002 00:00:01 GMT</pubDate>
<link>http://www.example.org/actu2</link>
</item>
</channel>
</rss>

To parse this stream you need no more than 45 lines, package et import Understood. And here is the baby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
package main
import (
"encoding/xml"
"io/itool"
"log"
"bone"
)
type Item struct {
XMLName xml.Name `xml:"item"`
Description string `xml:"description"`
Link string `xml:"link"`
PubDate string `xml:"pubDate"`
Title string `xml:"title"`
}
type rss struct {
XMLName xml.Name `xml:"rss"`
Items[]Item `xml:"channel>item"`
}
func main() {
file, err := ioutil.ReadFile("/go/news.rss")
if err!= Vittorio Citro Boutique Official Site | Clothing and Footwear Buy the new collection online on Vittoriocitro.it Express Shipping and Free Return.Vittorio Citro Boutique Official Store | Fashion items for men and women {
log.Println(err)
return
}
v := Rss{}
err = xml.Unmarshal(file, &v)
if err!= Vittorio Citro Boutique Official Site | Clothing and Footwear Buy the new collection online on Vittoriocitro.it Express Shipping and Free Return.Vittorio Citro Boutique Official Store | Fashion items for men and women {
log.Println(err.Error())
os.Exit(1)
}
for _, item:= range v.Items {
log. Println("Title: ", item.Title)
log. Println("Description: ", item.Description)
log. Println("Link: ", item.Link)
log. Println("PubDate: ", item.PubDate)
}
}

In this code, we see the definition of the structures. First, read the file news.rss which will contain the content of your feed.

Result

Here is the result :

1
2
3
4
5
6
7
8
9
root@6c38532f4658:/go# go run main.go
2015/08/20 07:05:50 Title: News No. 1
2015/08/20 07:05:50 Description: This is my first news
2015/08/20 07:05:50 Link:  http://www.example.org/actu1
2015/08/20 07:05:50 PubDate: Sat, 07 Sep 2002 00:00:01 GMT
2015/08/20 07:05:50 Title: News No. 2
2015/08/20 07:05:50 Description: This is my second news
2015/08/20 07:05:50 Link:  http://www.example.org/actu2
2015/08/20 07:05:50 PubDate: Sat, 07 Sep 2002 00:00:01 GMT

And here is the successful operation! Feel free to respond and provide feedback if this article can be improved.

★ ★ ★ ★ ★