Wednesday, December 7, 2022

Emacs: Using SQLite as a Data Source

Using SQLite as a data source: a framework and an example

Andrew Hyatt (he/him)

In this talk, Andrew Hyatt shows how to use SQLite to store and retrieve information using the triples package. Afterwards, he wil handle questions over BigBlueButton.

The following image shows where the talk is in the schedule for Sat 2022-12-03. Solid lines show talks with Q&A via BigBlueButton. Dashed lines show talks with Q&A via IRC or Etherpad.

Format: 22-min talk followed by live Q&A (done)
Etherpad: https://pad.emacsconf.org/2022-sqlite
Discuss on IRC: #emacsconf-dev
Status: Q&A finished, IRC and pad will be archived on this page


00:00:00.000 Introduction 00:56.220 Why SQLite 02:29.870 The triples package 04:50.209 Exercise: Emacs bookmarks 07:40.740 Creating bookmarks 09:31.366 Retrieving bookmarks 11:10.034 Backlinks 13:59.380 Extensible entities 15:57.008 EKG package 18:12.772 The code 19:43.780 Renaming tags 20:52.147 Conclusion

Description

Emacs can now be built with SQLite, giving native support for reading and writing to a database. With this, we can start seriously considering a SQLite-first approach: instead of storing data on the filesystem, and using various ad-hoc solutions for metadata, we can use SQLite to store and search our data. This is essentially a tradeoff between the power and speed of SQLite and the universality of the filesystem. If we accept that this approach is useful, then a standard way to store information in database, may be useful and promote package interoperability, just as our single filesystem does. The triples packages is a RDF-like database for supplying such a flexible system for storing and retrieving data from SQLite. A sample application, ekg, a replacement for org-roam, is shown using this, and the advantages of the triple design are explained.

For more information and the packages discussed here, see the triples and ekg pages.

Transcript

[00:00:00.000] Hello. I'm Andrew Hyatt. I've been working on Emacs, with Emacs, and to some extent on Emacs for a while. I've written the WebSockets library and Emacs calc tutorials. I've enjoyed use of many of everyone's incredible packages. So my thesis for this talk, why I'm giving this talk, is that I'm interested in SQLite. I think we should be exploring SQLite for applications in ways I think the community has shied away from. I'd like to introduce the triples package as a way, both easy and with interesting functionality, that will allow us to build extensible databases in a way that is, I think, a little bit unusual and perhaps compelling, I hope.

[00:00:56.220] So first of all, why SQLite? Why is this good? Well, SQLite is now built into Emacs. So you get a few things out of this when you use it for data. First of all, it's a database. It's extremely good for data, of course. There's a simplicity to data manipulation using a database, compared to data manipulation, that is, manipulating data in a text file. Text files are really not built for data. So when typically when you need to do this, like I know Org Mode is-- which I'm a huge, huge Org Mode fan-- it's all about sort of data in text. It does work, but you certainly would be hard pressed to make sweeping changes to your database that is represented in text. It's just not well suited for this sort of thing. It would take a long time, where the speed of SQL is incredibly impressive. I think certainly Emacs is not known for being extremely speedy. I think the overuse of text is part of this. Of course, text and using text, using files has awesome advantages as well. I'm really here to talk about the other side of the coin, right? Everyone can judge those advantages and disadvantages and make their own trade-offs, but I want to kind of make the pitch for SQLite.

[00:02:29.870] So let's talk about the triples package. The triples package is a package that is designed to give you a very generic schema. You don't have to do, for most of the common operations, you don't have to write SQL yourself. A lot of stuff is built in and is based on a very generic schema. That is, it's a single table. That table has, of course, fixed schema. It basically has three columns. It actually has four columns. In this talk, I'm not going to get into the fourth column and why, but it's useful. So the three columns are subject, predicate, and object. This is what it's related to what we call an RDF format. These things basically describe a link. The link is from the subject to the object. The link type is a predicate. That sounds overly theoretical, but the point is that you can describe a lot of things with this format. You probably describe everything with it. It's very simple because the schema is fixed. It's only this kind of data. That means for your application, you define a schema in subject, predicate, object format. That defines what data you can use, what types there are, what properties they have, how you can use the system, and what is legal to do. And this is stored as data. I think as Lisp people, I think we're all very onboard with the fact that you have a simple way to express everything, and you don't have these two systems. In this way, you don't have to have code as a system. Do you have to load code to use the triples package to make sure your schema is obeyed? No, it's all just built in to this database. I'll describe this. As I said, it's a little bit abstract right now, but it will become a lot clearer when we go through an example, which we're going to do now.

[00:04:50.209] As an exercise, let's create Emacs bookmarks, which basically are three things: a name, a file, and an annotation. I may be missing out on functionality. Of course, everything in Emacs, everything has lots and lots of functionality, but let's just start with this simple thing. First of all, we're going to open up a database. Pretty simple. I think there's nothing to explain there. But here on this line that I'm on right now, we are saying, okay, there's going to be a type called bookmark. It's going to have the following properties. First, a file, which is unique and a string. The second is an annotation, which is again unique and a string. Then we're going to have another type called named. First of all, why is it named as part of bookmark? As I'll get into, it's interesting when you start sharing this database with other things, not just bookmarks, but other types. Bookmarks are very similar to many other things that you might want to expand into. Those other things have names, but they're not bookmarks. It's nice to separate these concerns out and just have another type called named, which just basically has a name. We can execute this. It's not interesting to look at these. that is not all that useful for anything, It outputs something but that's okay. What's done is, actually, it's created a database and it's populated it with the schema. We can look at this. We won't go through all of this because it's a little bit too much for a short presentation like this, but you could see that there's something here that's like, oh, we have a subject bookmark. We have a property base/type. That just means that there's a property that's defined by the base. This means this is from the triples package itself. It's not some other package. Third is, what is the type of this object? It's a schema. This thing could be many types. As I said, if you have a-- we haven't seen an example yet, but if you have a bookmark, it's going to have a name type and a bookmark type. Everything is multi-typed and that's kind of a feature of this kind of storage system. I'm not going to go through everything, but you can see it's all there in triples, the whole schema, everything we just did.

[00:07:40.740] Let's create a bookmark. Again, we're going to connect to our database and we're going to basically set an entire subject. The subject is, it's basically like an entity. We're going to define a whole entity or you could refer to it as an object. That's a perfectly fine way to look at it, I think. It's going to have some identifier. That identifier could be anything. It doesn't have to be a string. It could really be anything, but we're going to give it a string called emacs-init. It does not matter what this identifier is, at least for our purposes. It does matter when you're linking to it, but I think nothing that I'm about to show you. This could truly be anything because it has a separate name. I'm giving it a name here, which is init. I'm just specifying the named type here and I'm specifying the bookmark type here and its values. We're going to do that. Now, if we look, we see everything we saw before, but now we have subject emacs-init. It has a type and it's named. We also see the same thing two lines down. This type is also a bookmark, but the name is init in the named/name, which is like the type is name, named, and the property is name. It's init. As you can see, this is just how everything looks. It's pretty straightforward and you can retrieve it. Now, we're looking at the database, but you don't really have to look at the database for... In fact, I think we're done looking at the triples format, because I think it's very simple. You've already got the hang of it, I think.

[00:09:31.366] Let's retrieve that just to make sure, yes, we can retrieve it. We're going to retrieve it and say, okay, what do we get when we load the emacs-init subject? Well, we get a plist of all of its properties, which then you can use in your application. There's many more ways to retrieve and there's many more ways to save. In fact, I think the way I did it here with set subject is probably not the right way to do it most of the time. It's usually because it'll erase everything. It's only really to be used when you're sure you control all the data, but you're never sure because there could be other packages that are also using this database, they could have their own data. You don't want to erase all the other Emacs init subject data. In this case, we did because we are confident it was a new entity, but in most cases, the right thing to do is just set it by type. Just say, we're just going to set the bookmark type, which is... The properties are this, and then the name type and the properties are that. That's a way that you could do things. There's also many retrieval types. The retrieval types: you can retrieve by a number of different ways, which I'm not going to get into, but you can read about in either the source or the readme in the package.

[00:11:10.034] We have backlinks as well. Let me explain what backlinks are. There's another feature of the triples. As I mentioned, these things can be thought about as links, but what could be a link in one direction could also be a link in the other direction, and we basically get this for free. Here's an example where we are again connecting toward bookmark. Here we're going to add a new type called tagged. We're going to give everything in tags. Here we're saying, okay, there's a type called tagged and it has a property called tags. This is not unique, so it's a list basically. It's a list of string. There also is a type called tag. This is for things that are tags themselves. Then it has a type called numbers and it has what we call a virtual reversed property. It's virtual because it's not actually stored. We just compute it by reversing the subject and the object. This is on tagged tags. When we query this, we can just say, okay, what are all the subjects that have tagged tags of me, the tag? In fact, let me demonstrate that for you. We're going to set the type on emacs-init. We're going to add the "tagged" type. This is the alternate way of setting data that I mentioned. This won't erase anything else. We're just adding something here. We're adding this type "tagged" to our previous bookmark emacs-init. We're going to add emacs and config as the tags. We're going to then set emacs as a tag and config as a tag. That just lets us have this virtual property. You have to do something. You can't get it out of thin air. The design decision we've made is: you at least need to tag it before you get the free property. What you should see... Let's try it out. We got the subject config, which we've set no data on. You can tell we're not sending any data. If I get that subject, the result is that it says its members are emacs-init. That's what a virtual reverse property. As we tag more things, this just continues to work because it's just doing a SQL query here.

[00:13:59.380] Besides showing off the backlinks function, this also shows off the general way you can have extensible entities. That is, it's possible that someone writes a bookmarks package that stores everything in a database, in the triples database, but then someone else can come and say, okay, I'm going to define my own types that's meant to work with this database, just like someone could do what I did here, just to add simple tagging. It's very easy to do. This stuff is not that easy to do otherwise. To do this in Lisp, I would say it's a little awkward. With databases, again, it's not only possible, it's relatively trivial, especially with this kind of database. The benefit is it's super easy to work with. With this kind of generic database, the drawback is it's not all that efficient as a special purpose table that is really built for efficiencies. A lot of times you have to do multiple lookups and things like that. Again, it's a trade-off for various things. As you can see, this is like one database for everything. That means that we don't have to all contribute to one giant database. All the packages that use triples, it doesn't have to be one database, but it's cool if it does. I don't know what I want to happen or what I expect to happen, but I think an interesting property is that this is a way for lots of data to live together and build off each other in ways that I think are hard to do with other forms of table layouts and things like that.

[00:15:57.008] Let's talk about a use of it, which is the EKG package. The EKG package is something I've written to demonstrate the triples library and use it for something I think is interesting, which is personal knowledge management systems of the same type, of the same genre that Org Roam is, but with different design decisions. I'll show it in action for a little bit. Let's just look at... it's all tag-based, same kind of tags we saw before when we were playing around with the bookmarks example application, What I want to show is: I can look at a tag. but everything here is... I could see notes with that tag. Everything you see here is in the database, no files involved. All of this is just a thing that's an entire object, the entire string. It works. It has tags. You can see not only this tag, but all the other tags associated with it and their notes. It's kind of an interesting way to do things. When you capture it, I think it's interesting. There's a lot of interesting design elements here. This tags thing is not part of the buffer. It's not like Org Roam. You see here in this other tags, these are things I've imported from Org Roam. This is why they look like they do. They have their own titles because I just wrote them all in Org Roam. What it looks like, really, for these notes is that it's just text. You really don't have to bother with this metadata. If I want another tag like pancakes, I can just add it here. Again, these tags will turn into data, triple data. The text is just a triple date[??] as well, but different to triple data. All these things are like that. You can open up any of these things, et cetera.

[00:18:12.772] I think the interesting thing here is to see the code. It's not super interesting to look at code for too long, but we don't have that long. Whenever we connect-- I just want to point out a few things. Whenever we connect, we have a schema. We just do this. Every time we connect, we just make sure it has the right schema. This way, the user is up to date. This schema just looks exactly like stuff I showed you in the triples when we were looking at bookmarks. It's not complicated. I even have stuff here with people to use this as a person database. I haven't figured out how I'm going to use this yet, but you can see just to do this, it's really trivial and it's pretty easy. Let's show a few other things, like getting all the tags. Here, we could just say, let's get the subjects of type tag. We have all the triples, all the ones that are of type tag, all the things that have a subject. All the subjects that have links, that have this type, this tag in it, we can return them all. of all these objects. Basically, it just gives you a list Again, you can think of these things as objects. All the objects of type tag, we'll just get them all. Super, super simple. Triples gives you this functionality out of the box. It's not that complicated.

[00:19:43.780] What I would like to show, and that shows my thesis for this whole talk, is this rename tag. Now, think about how you would rename a tag in Org Roam or anything where the tag is part of the files. It's like how you would re-tag everything in Org Mode. It's complicated and error-prone and slow. This is anything, this is instantaneous and super easy. Look, that's it. There's not that many places for an error to live here. One thing I would like to point out is that we are doing direct, not everything has to go through the triples package. Maybe it should, but the triples package is a fixed format, which is why it's okay-- whether it's a good idea, I'm not sure, but it's okay for client packages to just directly manipulate the tables. Here, we're just doing it just to update all the tags and then remove and set types so that the correct thing happens. As you can see, it's super, super simple.

[00:20:52.147] I think this proves my thesis about the advantages of applications with SQLite. Thank you for listening. I hope this puts ideas in your minds about taking advantage of this functionality. I hope to see more things using the triples library or otherwise that take advantage of this. Thank you for your time.

Captioner: sachac

Questions or comments? Please e-mail emacsconf-org-private@gnu.org

CategoryEmacsLisp



from Hacker News https://ift.tt/VxnowEN

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.