Irina Truong How to create and publish a Python package


Video Transcription

Ok. Uh So let's start. My name is Irina TRONG. I work as a backend engineer for a company called Parsley and I'm going to be talking today about creating and publishing a Python package. So for this talk, there is actually no need to know Python.If you do, this is great. If you don't, you will still understand everything that's going on. If you want to follow along and create your own package, you do need a little knowledge of Python and a working interpreter and um to install a Python package, you would normally need a Python interpreter and Pip Python package installer. Most likely if you are on a Mac or on a Linux, you already have that. If you are on windows, you would have to install a Python distribution. Uh And there are several of them for windows. Uh I know specifically one from active state. So here is the link to PIP PP A dot IO uh to install PIP if you don't have it yet. Python has two main types of packages. First of all, it's Python libraries. Python libraries have the following features. They are install imputable and versioned. Uh What does it mean well, you run a command to install a library and then you can import things from the library. So the library is going to expose functions for people to input.

And of course libraries are version libraries are written usually to target developers, not end users, they are useful for other applications. And here we come to another type of Python package which is application. So applications have all of the above features, install imputable versions, but they also run able. So you can run an application and it will probably give you some nice output like a web page or a console app. It may print out something for you. An application is built to target end users. And in this talk, I'm going to do a short demo with specifically a console application. So what's the console application? Um If you worked as a developer in Linux, you would probably work in a terminal a lot. And a terminal window is this little black board window um on my screen right now, but it doesn't have to be quite so black and boring. There are console applications with quite a rich user interface and you can see a couple of screenshots here. I um made screenshots for midnight commander which is a file manager. And for uh hoh do lets you see your machine's activity like processor load and stuff like this. So it doesn't have to be quite so black and boring, but normally it's just a little window where you type in commands and you see the output and this is going to be my application, you type in the command and it tells you today's message.

So it's going to tell you today is good for some kind of activity. And right now I'm going to show the code for this application. Yeah. So let's say this is my Python code and it lives in a file called main dot py. Right now, I don't have anything clear slate, just one file. This is my main dot py as you can see at the top of the package um at the top of the file I import daytime and random package. Those are two packages from standard Python library. And then I'm writing a function called today message. This function is going to return a string and the string is going to say today is good for a certain activity and the activities don't be picked randomly, but the randomness is going to be specific for current day time. Ok. So now I have this function, the function can be imported by other packages already, but I am going to develop a variable application here. And that's why I also need a main function which actually going to call my the message and print the output. And on the right of my screen here, this is how you run it right now, Python main dot Py. So you call Python interpreter and as an argument, you provide the script name, it tells me today is good for traveling. So can I share my code right now? It works right. Why not all kind of to share a code right now? You would have to send main dot py module to people and they would need a Python interpreter in their path and they would run Python main dot py and see the wonderful uh today message.

Well, I think how many people can I send my main file to? I don't know that many people. I want people from all over the world to be able to use it. So there must be a better way to share code. And for Python code, of course, there is. But before we can share it with the whole world, we need a few more things. First of all, um a Python package has to be um has to be in specific code structure, has to conform to a specific code structure. Then you have to add a little bit of documentation and metadata, some directions for the installer tool and then the upload to the packaging server. So your distribution needs to be uploaded to the centralized repository of Python packages. This is called PPY. So our next step would be to create a package using our module, a price and file is just called a module packages have a certain structure. And first you have to create a parent directory for the package and then inside that you create a directory with the same name as your package. So in my case, it would be my advisor. Then I would put my name dot py file inside my advisor directory. And next to it, I would create a special file called in dot py. You see that with double underscores around I it it's a special file that just tells Python interpreter. This directory is a package.

Um Next, I would have to create setup dot py with instructions for the installer tool and a readme file. Uh Read me can be uh several text based formats here, I'm using a markdown file. So if you're going to share your code with people, if you're going to open source, it, there are a few things that are nice to have. First of all, you would probably want to have change log and author's file, a contributing guide uh list of requirements and a license and we're going to talk a little bit more about those later. So first of all, let's take a quick look at setup dot py and setup dot py. Like I said, these are basically instructions for the installer to how to install your package on user system. So it is also a Python file. And in the first line here I import setup tools. Setup tools is also a Python package, but it's a package that knows how to install other packages. Then I call one function called setup from setup tools, providing a whole bunch of parameters. Let's go through the parameters.

First of all, it has to be the name, the name of the package, of course. And here it's my advisor. Then you have to provide a version which is 001 author, name, my name, author, email. That's my actual email here. Uh Description gives useful advice, useful. It's very useful, a loan description for the long description. I'm using my reading file here and I'm going to say that, OK. My long description is a mark down file. So content type, search for that. And here is the second half of my setup dot py. I have to provide a project URL. Usually they point to github, but if you have like a complicated um application, you may even create a dedicated website for your application. Next I tell setup tools to go and find all packages in the current directory. And then I provide an entry point. What's an entry point? Basically, I'm going to say when this package is installed, I want an alias to be created to call my main function from the user shell. So instead of saying Python name dot py, they just gonna say my advisor, they're just gonna type the command called my advisor in the console. Then I'm going to provide a list of classifiers here. I specify that I developed this package in Python three. I'm using an MIT license and it's not dependent on any specific operating system. And optionally, you can provide that your package requires specific version of Python. For example, um you are using some low low language features.

Uh and you need old Python like 27 or you're using some advanced language features and you want to use 3.6 and up, I actually think there is no reason for people to be using Python two anymore. Python two finally reached its end of life. So everybody wants to be using Python three now. And classifiers usually you would want to include who the package is for. So the audience of your package which license it's using which systems it can run on the level of maturity and possibly some other metadata. And you can uh go to PI P i.org classifiers and see a large searchable list of all possible Python class just OK. So what are entry points in Python? Entry points are very useful packaging features. They allow you to export functions or to extend your package with plugins right now. We exported one function here. This is a function called main in inside the package. My advisor and I created an alias or a shortcut for it called my advisor. You can also extend your package with plugins so your package could provide entry points for other packages to plug into and add some functionality to your package. This one is a very fun one, but we're not going to be talking about it here. So here is my minimal read me file. You don't need that much information about my package because it's very simple. It just has a title and one paragraph about what it does.

But perhaps if you had a more complicated package than mine is, you could describe all the different parameters and options and use cases how to call it, how to use it. You could include all kinds of information in the reading. So the next thing to do is uh packaging it up. And here uh first, I'm going to install a couple more Python tools. So I'm going to use pip, pip install and called pip install setup tools and wheel cup tools. And we are two packages that are going to be used to create a distribution. Next, once I did that, I'm going to call Python setup dot PYSD and BDW, what does it all mean SD? This option is going to create a source distribution out of my package source distribution. It's basically going to wrap my source code in a type of archive and on a user system, it will have to be built and put in the right place. BW. It's going to create a will distribution which is a built distribution. This universal flag says that my will is good for any Python and any system architecture. So what just happened after I ran those commands? I'm going to have a dis subdirectory in my pent directory. And there's gonna be two files there. As I just said, this command above it created um two types of Python distributions. So first, we have a source distribution. This is the Tar file and second, this is built distribution and that's a wheel file.

So WHL files called are called wheels. Let's look inside. So Tar file is in the archive and we can list the contents of the archive and uh and see what's inside. So here on this slide, all the files in black, those are my files, you just saw that I created them and edit them inside my package and those files in blue. I did not have those before. Setup tools went and created those files and put some metadata in it. If you actually build your archive and take a look inside usually um those extra files, they are just text files. You can see the contents if you're curious, but you don't actually need to know what goes inside. Setup tools deals with all of that and W file is a build distribution, but the WHL extension is just high in that. It's really a zip archive. So we can look inside a zip archive and see that it has all the same um any dot py and main dot py, those are mine and a bunch of files again where it puts some metadata and it created a real file. So all the metadata is pretty much if you look at it, it's going to be taken from my setup dot P I where I put in all the extra information about my package. So why do we need two types of distributions?

Um A source distribution is an older Python packaging format. A wheel distribution is a newer one and it's quicker to install. So even for pure Python packages, it is quicker to install. But because there is no built step, the package is just going to be pretty much put in the right place on the user system, but especially it's much faster for packages with native extensions. If you ever used any Python packages for um data science such as Nom Pi or Sci pi, they have lots of um built C extensions. If you have to build them on your system, it takes a really long time if you're using a wheel to install the package and by default, Pip actually will use a wheel if it's available for your system architecture. Uh it goes much, much faster. So let's take a look at the name of our real distribution. First of all, it's going to include a package name, then it's going to include a package version Python tag. In my case, it's tagged for Python two and Python three because I use the universal flag, then application binary interface here, I'm not targeting any specific binary interface. So it says none and then a platform tag.

And again, it says any because I'm not targeting any specific operating system. So next thing to do would be to upload it to PI P I server. I'm going to create an account. And first of all, you want to create an account on the to server on the test server, not the actual Python packaging server. So you would go to test dot PP i.org and register an account. And as usual, we would provide your email address. Uh It would send you a link to verify your email. Once you have that, you can already upload uh your package to PI P I. But I'm going to create a new API token here. Uh because it's always better to use a token and not a log in password uh on your system because you have to store it in a text file. So it's better to store a token in a file than your login and password because you know, people can take a look at your login and password in a plain text file. That's not a good thing. So we're going to go and configure the test server in special file called PI Pirc. The special file lives in your home directory. This is how Pi Pirc file looks.

So first you create an entry for test pi P server and you're going to say user name is a token because you're going to use a token and not login and password. And then in place of a password, you provide the actual token that you received on PI P website. I didn't put an actual token here. So next step, we would upload our package to PI P I server for that. We need yet another Python tool. And that Python tool is called Twine. I'm going to run keep installed wine that will install the twin package. And then I'm going to call Twine upload uh all files from my dis subdirectory. And I'm going to say the repository is test PP I which is the one I just configured. So it's the same name as in my pi pirc file. Ok. So once I did that, hopefully it uploaded successfully, I'm going to pip install it uh my package just to test it and make sure it works. I'm going to pip install my advisor providing the index URL to test IP i.org. You only need this index URL if you install it from the test server. Otherwise it knows the URL of the actual Python packaging server. You would normally just say pip install package name.

So people install my advisor and then once I installed it, I can run it in terminal and CD today's message. And as you can see, I run my advisor. So it created the shell command for me. I don't need to run Python mean dot Py anymore. Let's talk a little bit about the license. Um A lot of people use MIT or no GP LV three license for their open source packages. And this is the two on my slide here, mit license is very popular. It pretty much allows everybody to do anything they want with their source code. So people can copy, distribute or modify their source code your source code. Uh You basically submit all the rights. You don't want to keep any copyright with be with GP LV three. That is another very commonly used. People cannot close this source. So sometimes you distribute your code and with mit license, people can actually compile your code and distribute it pre compiled. So nobody else can change the source with glue GP LV three. This is not allowed any time you distribute the source, you'll have to provide the actual source and not just the build distribution because people may want to modify it and you might say, OK, I don't care what people do with my code. Uh I don't want to worry about the license. I'm going to provide no license. That's actually not a good option for open source package because uh when you include no license by default, it means people don't have any permissions to do anything with your code.

They can't copy, distribute or modify it. If you are developing your package inside the community, let's say uh your package is part of Apache community. For example, the community may already have a preferred license and you don't have to worry about picking one. But if you do want to pick a license for your own package. Uh You can go to choose a license.com and it has a very nice interactive wizard type of web page where you can um specify your requirements and pick a type of license that fits your package best. And you can also see a huge table of comparison uh comparison table of all the license options available. Let's talk uh a bit more about optional but useful things. So change log what is change log. Uh It's useful for the users of your application but also for developers, you would track all the changes, all the bug fixes and new features by date and version. And you can understand how it's useful, right? So some user is having a problem with your application, something is not working correctly. They can take a look at the change look and see. Do I have the latest version?

Perhaps my issue is already fixed in the latest version and they can see like which version they need to install perhaps uh they currently have a 1.0 and the bug is fixed in version 2.0 right? So did you just need to install 2.0 for contributors? You might want to keep a list of authors in the author's file for corporate purposes or simply to give people credit and for developers, you might want to have a contributing guide. So it's pretty much a text file where you would describe to people that want to work on your application, how to best to do that. First of all, you would probably want to put environment set up into that file. Um If you have a specific code style, if you have a git workflow, you want to conform to how you go into your view. Review pr s. Um what are the requirements for aiding tests? What frameworks you use, you use for testing and stuff like that? So this file is useful for developers. Next, um If you have any Python dependencies which are not part of the standard library, you would provide a requirement dot TXT. So requirements dot TXT usually serves as a list of requirements that your application needs. So your dependencies and uh requirements then dot TXT, this is a file that you would put in your package. If you want to give your developers a list of what you are using to test, build or publish your package.

An easy way to create a list of requirements with Python is a pep freeze command and an easy way to install requirements from um the requirement dot TXT file is peep, install dash R dash R, ok. So now you pretty much know everything about creating and building a Python package. So let's summarize. What did it take? We wrote some code here. We added the setup file with some metadata. We added a license and documentation, the readme file. We had to install three tools, setup tools Wheel and Twine and we ran just two commands, Python setup dot py and twine upload. And there is no magic whatsoever in the whole process and I have a short demo. So I'm going to share my other screen here. Ok. So here is a demo of me working on my package and uploading the package to pi pi P I server. So I just created and activated a virtual environment which is just part of a development workflow. Now, I'm going to install my setup tools, wheel entwine. So it's gonna run for a bit and that pretty quickly. And I'm going to update the version in my setup dot py file because you cannot re upload the package to the server with the same version, uh same name and same version is not gonna work and I already uploaded it before. OK?

So you can see I have two distribution files here and next, I'm going to call this twine upload command. It's going, it's done very quickly. So I just deactivated my, I just deactivated my environment because I want to install my package from test server in a clean environment and there it goes. OK? And that's it. That's the whole demo. It runs in about one minute. OK? Let me share my other presentation. Um The code that you saw in this demo, I shared it on github. Here's the link. I'm gonna stay on this slide for a little bit. So you guys can copy if you like. So my github login is J Bennett J dot Bennett and the package name or the uh repo name is my advisor and here are some useful links. First link is the link to my slides. Uh This is a short link. Second link is the link to my demo that I just showed you and then some documentation links to Python packaging guide, the documentation about Python package in Solr and Python Wheels. Those are optional. If you want more information about Python packaging, you don't actually need to know anything, any of that to build and publish your own package. As you saw a few commands. That's all it takes here is my contact information.

Um If you want to contact me, find me on Twitter or send me an email. Ok. I'm going to switch back to the link slide, but that's it right now. We have time for questions. Let me take a look. Do you see? Uh Did you have any questions that I didn't answer yet? Rest in peace Python two. Yes. Python is pure magic. Yes. Uh I oh Was the phone too small to read? Um Well, there is a link to my demo so you can actually take a look for yourself and increase it uh uh full screen it for yourself. I think it's better. Oh It needs access. Mm OK. I'm going to check after this demo. Make sure that it's in common access. So the link to the slides is, is dot Gdaravoj. Anybody was able to open that the link works? Ok. Ok, good. Do I have Cagle? Nope, I don't have Cagle. So it's like package Jason and javascript. Uh What is like package Jason and javascript? They set up that P I think so. Ok. Well, which part of this process are developers likely to experience friction or obstacles during? Huh. That's a good question. Um First of all people um may not know to build a wheel or why they need to build a wheel and just build in as this and that makes it um less efficient for other people to install their application.

Second, you might experience problems if you named your package the same as some other package or if you're trying to upload the same version, if you forgot to uh update the version, now the slides should not require requesting access. Huh? So it works for some people but but not for other people. Ok. Ok. Got that. Um OK. I'm going to make it public after this demo. Uh This is silly. Let me see. Mhm. Ok. Um Can you try the short link again guys? And can you also try the this link in the chat right now? Lonely works? OK. Use that. Ok. Does the short link work as well or not? Perfect. Ok, great. Uh Do you guys have any more questions? Ok. If there is no more questions, then I'm going to go and give you the rest of the time, which is like three minutes, you can have them back. Oh, if I'm new and learning. Yes, absolutely. You can upload a package to the test server first. It works the same way as the regular packaging, Python server. And, uh, after a few weeks I think the packages are actually deleted from test server. So, if you worried about somebody in some people installing them by mistake, you know, and they are broken or something, don't worry, they're gonna be just cleared from the test server and they're not gonna be there in a short time.

Uh And the test PP, the test PP I server and the real PP I server, they require two different logins but they work absolutely the same in terms of the workflow. The summary slide, let's switch to the summary slide. OK? Uh Some code, some metadata, some documentation and license, install three tools, set up, tools will entwine and run two commands. Is there any cost when we register the test pipi? No, it's free and the regular pipi is free as well. And it's usually a good idea before you create a package to um search if a package similar to what you want to do already exists. So you can search by name by description. You go to uh pi P I dot dot org, I think it's org and just do a search. How can we be as cool as you. Thank you for the compliment. Can we use test PP I with data science projects? If your project is a Python package, you can absolutely upload it to PI P I. What is the first package you ever made? My first ever open source application is called wifi. It's still on github and in PP I, you can install it. It's AC for Docker. So I think uh do I still have a website? Yes. Ok. I have an actual website for my first package. What my best regular expression was. Oh, my goodness. It's really hard to remember. Just one. You know, I, I wrote a lot of regular expressions in my career.

Sorry. Can't answer that one. Ok. I think we only have a few seconds left. Wait, selected Bill me. What, for, how many years have you worked with Pyon for the last six years? Python is a very easy language to learn. I actually learned it by myself. Ok. I think, um, I have to say goodbye because the session is gonna end. Bye everyone.