Difference between pages "POSIX Threads Explained, Part 1" and "Hostname"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
 
 
Line 1: Line 1:
{{Article
w.i.p
|Summary=POSIX (Portable Operating System Interface) threads are a great way to increase the responsiveness and performance of your code. In this series, Daniel Robbins shows you exactly how to use threads in your code. A lot of behind-the-scenes details are covered, so by the end of this series you'll really be ready to create your own multithreaded programs.
==Introduction==
|Author=Drobbins
A hostname is a unique name created to identify a machine on a network. In computer networking, a hostname  is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication such as the World Wide Web, e-mail or Usenet. Hostnames may be simple names consisting of a single word or phrase, or they may be structured.
|Next in Series=POSIX Threads Explained, Part 2
==Configuration==
}}
In Funtoo Linux <code>/etc/conf.d/hostname</code> is the only configuration file for setting a hostname. In OpenRC framework <code>/etc/conf.d/foo</code> is the configuration file for a corresponding Init script <code>/etc/init.d/foo</code>. With the case of hostname, default value in <code>/etc/conf.d/hostname</code> is set to ''localhost'', means when system boots and OpenRC's <code>/etc/init.d/hostname</code> script started a hostname getting only ''localhost'' name. How it looks?  In your shell promt this will look in following way, an example for root:
== Threads are fun ==
<console>
 
localhost ~ # ##i## Hello :)
Knowing how to properly use threads should be part of every good programmer's repertoire. Threads are similar to processes. Threads, like processes, are time-sliced by the kernel. On uniprocessor systems the kernel uses time slicing to simulate simultaneous execution of threads in much the same way it uses time slicing with processes. And, on multiprocessor systems, threads are actually able to run simultaneously, just like two or more processes can.
</console>
 
Let's play a bit with a configuration. Open <code>/etc/conf.d/hostname</code> with your favorite editor and set a hostname of your choice. Below, I will use a real examples  from one of my working test boxes.
So why is multithreading preferable to multiple independent processes for most cooperative tasks? Well, threads share the same memory space. Independent threads can access the same variables in memory. So all of your program's threads can read or write the declared global integers. If you've ever programmed any non-trivial code that uses fork(), you'll recognize the importance of this tool. Why? While fork() allows you to create multiple processes, it also creates the following communication problem: how to get multiple processes, each with their own independent memory space, to communicate. There is no one simple answer to this problem. While there are many different kinds of local IPC (inter-process communication), they all suffer from two important drawbacks:
<console>
 
localhost ~ # ##i## nano /etc/conf.d/hostname
* They impose some form of additional kernel overhead, lowering performance.
</console>
* In almost all situations, IPC is not a "natural" extension of your code. It often dramatically increases the complexity of your program.
Let's set it to hostname="oleg-stable.host.funtoo.org". Save the file and restart  a hostname service:
 
<console>
 
localhost ~ # ##i## service hostname restart
Double bummer: overhead and complication aren't good things. If you've ever had to make massive modifications to one of your programs so that it supports IPC, you'll really appreciate the simple memory-sharing approach that threads provide. POSIX threads don't need to make expensive and complicated long-distance calls because all our threads happen to live in the same house. With a little synchronization, all your threads can read and modify your program's existing data structures. You don't have to pump the data through a file descriptor or squeeze it into a tight, shared memory space. For this reason alone you should consider the one process/multithread model rather than the multiprocess/single-thread model.
</console>
 
Now, let's examine our changes, after a restarting a hostname
== Threads are nimble ==
<console>
 
oleg-stable ~ # ##i## Hello :)
But there's more. Threads also happen to be extremely nimble. Compared to a standard fork(), they carry a lot less overhead. The kernel does not need to make a new independent copy of the process memory space, file descriptors, etc. That saves a lot of CPU time, making thread creation ten to a hundred times faster than new process creation. Because of this, you can use a whole bunch of threads and not worry too much about the CPU and memory overhead incurred. You don't have a big CPU hit the way you do with fork(). This means you can generally create threads whenever it makes sense in your program.
</console>
 
== Diving deeper==
Of course, just like processes, threads will take advantage of multiple CPUs. This is a really great feature if your software is designed to be used on a multiprocessor machine (if the software is open source, it will probably end up running on quite a few of these). The performance of certain kinds of threaded programs (CPU-intensive ones in particular) will scale almost linearly with the number of processors in the system. If you're writing a program that is very CPU-intensive, you'll definitely want to find ways to use multiple threads in your code. Once you're adept at writing threaded code, you'll also be able to approach coding challenges in new and creative ways without a lot of IPC red tape and miscellaneous mumbo-jumbo. All these benefits work synergistically to make multithreaded programming fun, fast, and flexible.
Notice, that in above output we seeing a shortened hostname and not a FQDN (Fully Qualified Domain Name). Don't be frustrated. This is  how  default bash promt <code>PS1</code> set. To get nice promts, please, follow http://www.funtoo.org/Prompt_Magic
 
Another way to test our settings is using a '''hostname''' command. Here we will show only  some of it's features. Let's try to execute '''hostname''' command:
== I think I'm a clone now ==
<console>
 
oleg-stable ~ # ##i## hostname
If you've been in the Linux programming world for a while, you may know about the <code>__clone()</code> system call. <code>__clone()</code> is similar to <code>fork()</code>, but allows you to do lots of things that threads can do. For example, with {{c|__clone()}} you can selectively share parts of your parent's execution context (memory space, file descriptors, etc.) with a new child process. That's a good thing. But there is also a not-so-good thing about {{c|__clone()}}. As the {{c|__clone()}} man page states:
oleg-stable.host.funtoo.org
 
</console>
<blockquote>
Now we see our fully qualified domain name hostname just how we configured it in <code>/etc/conf.d/hostname</code> in above paragraph. To get a short hostname we need to set '''-s ''' (short) argument to hostname command.  
"The __clone call is Linux-specific and should not be used in programs
intended to be portable. For programming threaded applications (multiple
threads of control in the same memory space), it is better to use a library
implementing the POSIX 1003.1c thread API, such as the Linux-Threads
library. See pthread_create(3thr)."
</blockquote>
 
So, while {{c|__clone()}} offers many of the benefits of threads, it is not portable. That doesn't mean you shouldn't use it in your code. But you should weigh this fact when you are considering using {{c|__clone()}} in your software. Fortunately, as the __clone() man page states, there's a better alternative: POSIX threads. When you want to write portable multithreaded code, code that works under Solaris, FreeBSD, Linux, and more, POSIX threads are the way to go.
 
== Beginning threads ==
 
Here's a simple example program that uses POSIX threads:
 
{{file|name=thread1.c|lang=c|desc=Sample program using POSIX threads|body=
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
 
void *thread_function(void *arg) {
  int i;
  for ( i=0; i<20; i++ ) {
    printf("Thread says hi!\n");
    sleep(1);
  }
  return NULL;
}
 
int main(void) {
 
  pthread_t mythread;
 
  if ( pthread_create( &mythread, NULL, thread_function, NULL) ) {
    printf("error creating thread.");
    abort();
  }
 
  if ( pthread_join ( mythread, NULL ) ) {
    printf("error joining thread.");
    abort();
  }
 
  exit(0);
 
}
}}
 
To compile this program, simply save this program as {{c|thread1.c}}, and type:
 
<console>
<console>
$ ##i##gcc thread1.c -o thread1 -lpthread
oleg-stable ~ # ##i## hostname -s
oleg-stable
</console>
</console>
 
Good! Hostname offers more then just displaying a system host name but can also set one. Let's try:
Run it by typing:
 
<console>
<console>
$ ##i##./thread1
oleg-stable ~ # ##i## hostname foo.bar.baz
oleg-stable ~ # ##i## hostname
foo.bar.baz
</console>
</console>
As you can see, we changed a hostname on-the-fly. This is not recommended way.
{{fancywarning|Please, notice that using '''hostname''' command to configure will work temporary for a current session and will be reverted back to a value set in <code>/etc/conf.d/hostname</code> file with next system restart.}}


== Understanding thread1.c ==
Now that we got a brief description of a hostname and basic configuration steps, its time to reflect another important case which is directly related to a Funtoo Linux hostname generation, a hosts.
 
{{c|thread1.c}} is an extremely simple threaded program. While it doesn't do anything useful, it'll help you understand how threads work. Let's take a step-by-step look at what this program does. In {{c|main()}} we first declare a variable called {{c|mythread,}} which has a type of {{c|pthread_t}}. The pthread_t type, defined in pthread.h, is often called a "thread id" (commonly abbreviated as "tid"). Think of it as a kind of thread handle.
 
After mythread is declared (remember that mythread is just a "tid", or a handle to the thread we are about to create), we call the {{c|pthread_create}} function to create a real, living thread. Don't be confused by the fact that pthread_create() is inside an "if" statement. Since pthread_create() returns zero on success and a non-zero value on failure, placing the function call in an if() is just an elegant way of detecting a failed pthread_create() call. Let's look at the arguments to pthread_create. The first one is a pointer to mythread, {{c|&mythread}}. The second argument, currently set to NULL, can be used to define certain attributes for our thread. Because the default thread attributes work for us, we simply set it to NULL.
 
Our third argument is the name of a function that the new thread will execute when it starts. In this case the name of the function is thread_function(). When this thread_function() returns, our new thread will terminate. In this example our thread function doesn't do anything remarkable. It just prints out "Thread says hi!" 20 times and then exits. Notice that thread_function() accepts a void * as an argument and also returns a void * as a return value. This shows us that it's possible to use a void * to pass an arbitrary piece of data to our new thread, and that our new thread can return an arbitrary piece of data when it finishes. Now, how do we pass our thread an arbitrary argument? Easy. We use the fourth argument to the pthread_create() call. In this example, we set it to NULL because we don't need to pass any data into our trivial thread_function().
 
As you may have guessed, the program will consist of two threads after pthread_create() successfully returns. Wait a minute, two threads? Didn't we just create a single thread? Yes, we did. But our main program is also considered a thread. Think of it this way: if you write a program and don't use POSIX threads at all, the program will be single-threaded (this single thread is called the "main" thread). By creating a new thread we now have a total of two threads in our program.
 
I imagine you have at least two important questions at this point. The first is what the main thread does after the new thread is created. It keeps on going and executes the next line of our program in sequence (in this case that line is "if ( pthread_join(...))"). The second question you may be wondering about is what happens to our new thread when it exits. It stops and waits to be merged or "joined" with another thread as part of its cleanup process.
 
OK, now on to {{c|pthread_join()}}. Just as pthread_create() split our single thread into two threads, pthread_join() merges two threads into a single thread. The first argument to pthread_join() is our tid mythread. The second argument is a pointer to a void pointer. If the void pointer is non-NULL, pthread_join will place our thread's void * return value at the location we specify. Since we don't care about thread_function()'s return value, we set it to NULL.
 
You'll notice that our thread_function() takes 20 seconds to complete. Long before thread_function() completes, our main thread has already called pthread_join(). When this occurs our main thread will block (go to sleep) and wait for thread_function() to complete. When thread_function() completes, pthread_join() will return. Now our program has one main thread again. When our program exits, all new threads have been pthread_join()ed. This is exactly how you should deal with every new thread you create in your programs. If a new thread isn't joined it will still count against your system's maximum thread limit. This means that not doing the proper cleanup will eventually cause new pthread_create() calls to fail.
 
== No parents, no children ==
 
If you've used the fork() system call you're probably familiar with the concept of parent and child processes. When a process creates another new process, using fork(), the new process is considered the child and the original process is considered the parent. This creates a hierarchical relationship that can be handy, especially when waiting for child processes to terminate. The {{c|waitpid()}} function, for example, will tell the current process to wait for the termination of any child processes. waitpid() is used to implement a simple cleanup routine in your parent process.


Things are a little more interesting with POSIX threads. You may have noticed that I have intentionally avoided using the terms "parent thread" and "child thread" so far. That's because with POSIX threads this hierarchical relationship doesn't exist. While a main thread may create a new thread, and that new thread may create an additional new thread, the POSIX threads standard views all your threads as a single pool of equals. So the concept of waiting for a child thread to exit doesn't make sense. The POSIX threads standard does not record any "family" information. This lack of genealogy has one major implication: if you want to wait for a thread to terminate, you need to specify which one you are waiting for by passing the proper tid to pthread_join(). The threads library can't figure it out for you.
==Hosts case==
 
As per man page <code>hosts</code> stands for static table lookup for hostnames and it's configuration file is <code>/etc/hosts</code>. Here is how it looks
To many people this isn't very good news because it can complicate programs that consist of more than two threads. Don't let it worry you. The POSIX threads standard provides all the tools you need to manage multiple threads gracefully. Actually, the fact that there is no parent/child relationship opens up some creative ways to use threads in our programs. For example, if we have a thread called thread 1, and thread 1 creates a thread called thread 2, it's not necessary for thread 1 itself to call pthread_join() for thread 2. Any other thread in the program can do that. This allows for interesting possibilities when you write heavily multithreaded code. You can, for example, create a global "dead list" that contains all stopped threads and then have a special cleanup thread that simply waits for an item to be added to the list. The cleanup thread calls pthread_join() to merge it with itself. Now your entire cleanup will be handled neatly and efficiently in a single thread.
{{file|name=/etc/hosts|body=
 
# Auto-generated hostname. Please do not remove this comment.
== Synchronized swimming ==
127.0.0.1       oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
 
::1             oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
Now it's time to take a look at some code that does something a little unexpected. Here's thread2.c:
 
{{file|name=thread2.c|lang=c|body=
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
 
int myglobal;
 
void *thread_function(void *arg) {
  int i,j;
  for ( i=0; i<20; i++ ) {
    j=myglobal;
    j=j+1;
    printf(".");
    fflush(stdout);
  sleep(1);
    myglobal=j;
  }
  return NULL;
}
 
int main(void) {
 
  pthread_t mythread;
  int i;
 
  if ( pthread_create( &mythread, NULL, thread_function, NULL) ) {
    printf("error creating thread.");
    abort();
  }
 
  for ( i=0; i<20; i++) {
    myglobal=myglobal+1;
    printf("o");
    fflush(stdout);
    sleep(1);
  }
 
  if ( pthread_join ( mythread, NULL ) ) {
    printf("error joining thread.");
    abort();
  }
 
  printf("\nmyglobal equals %d\n",myglobal);
 
  exit(0);
 
}
}}
}}
As you can see it has entries from our <code>/etc/conf.d/hostname</code>. As you may have guessed, in Funtoo Linux <code>/etc/hosts</code> file entries are auto-generated, when OpenRC hostname service starts. Previously, it is used to edit <code>/etc/hosts</code> manually. In Funtoo Linux there is no such need.


== Understanding thread2.c ==
What about custom hosts entries? This can be easily configured with 'aliases'. For example you want to have a hosts for your remote router or a computer in home LAN. Let's try to modify <code>/etc/conf.d/hostname with adding following - my remote computer oleg.distant.home has an IP 10.1.1.2:
 
<console>
This program, like our first one, creates a new thread. Both the main thread and the new thread then increment a global variable called myglobal 20 times. But the program itself produces some unexpected results. Compile it by typing:
oleg-stable ~ # ##i## echo 'aliases="10.1.12 oleg.distant.home"' >> /etc/conf.d/hostname
 
oleg-stable ~ # ##i## service hostname restart
<console>
$ ##i##gcc thread2.c -o thread2 -lpthread
</console>
</console>
 
Examine our changes:
And run it:
 
<console>
<console>
$ ##i##./thread2
oleg-stable ~ # ##i## cat /etc/hosts
..o.o.o.o.oo.o.o.o.o.o.o.o.o.o..o.o.o.o.o
myglobal equals 21
</console>
</console>
 
{{file|name=/etc/hosts|body=
Quite unexpected! Since myglobal starts at zero, and both the main thread and the new thread each increment it by 20, we should see myglobal equaling 40 at the end of the program. Since myglobal equals 21, we know that something fishy is going on here. But what is it?
# Auto-generated hostname. Please do not remove this comment.
 
10.1.1.2        oleg.distant.home
Give up? OK, I'll show you why it happens. Take a look at thread_function(). Notice how we copy myglobal into a local variable called "j"? And see how we increment j, then sleep for one second, and only then copy our new j value into myglobal? That's the key. Imagine what happens if our main thread increments myglobal just after our new thread copies the value of myglobal into j. When thread_function() writes the value of j back to myglobal, it will overwrite the modification that our main thread made.
127.0.0.1      oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
 
::1             oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
When writing threaded programs, you want to avoid useless side effects like the one we just looked at because they're a waste of time (except when you are writing articles on POSIX threads, of course :). Now, what can we do to eliminate this hassle?
}}
 
Since the problem occurs because we copy myglobal to j and hold it there for one second before writing it back, we could try to avoid using a temporary local variable and incrementing myglobal directly. While this solution will probably work in this particular example, it is not correct. And if we were performing a relatively complex mathematical operation on myglobal instead of just incrementing it, it would certainly fail. But why?
 
To understand the problem, you need to remember that threads run simultaneously. Even on a uniprocessor machine (where the kernel uses time slicing to simulate true multitasking) we can, from a programmer's perspective, imagine both threads as executing at the same time. thread2.c has problems because the code in thread_function() relies on the fact that myglobal will not be modified during the ~1 second before it is incremented. We need some way for one thread to tell the other to "hold off" while it's making changes to myglobal. I'll show you exactly how to do this in my next article. See you then.
{{ArticleFooter}}

Revision as of 16:04, February 24, 2015

w.i.p

Introduction

A hostname is a unique name created to identify a machine on a network. In computer networking, a hostname is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication such as the World Wide Web, e-mail or Usenet. Hostnames may be simple names consisting of a single word or phrase, or they may be structured.

Configuration

In Funtoo Linux /etc/conf.d/hostname is the only configuration file for setting a hostname. In OpenRC framework /etc/conf.d/foo is the configuration file for a corresponding Init script /etc/init.d/foo. With the case of hostname, default value in /etc/conf.d/hostname is set to localhost, means when system boots and OpenRC's /etc/init.d/hostname script started a hostname getting only localhost name. How it looks? In your shell promt this will look in following way, an example for root:

localhost ~ #  Hello :)

Let's play a bit with a configuration. Open /etc/conf.d/hostname with your favorite editor and set a hostname of your choice. Below, I will use a real examples from one of my working test boxes.

localhost ~ #  nano /etc/conf.d/hostname

Let's set it to hostname="oleg-stable.host.funtoo.org". Save the file and restart a hostname service:

localhost ~ #  service hostname restart

Now, let's examine our changes, after a restarting a hostname

oleg-stable ~ #  Hello :)

Diving deeper

Notice, that in above output we seeing a shortened hostname and not a FQDN (Fully Qualified Domain Name). Don't be frustrated. This is how default bash promt PS1 set. To get nice promts, please, follow http://www.funtoo.org/Prompt_Magic Another way to test our settings is using a hostname command. Here we will show only some of it's features. Let's try to execute hostname command:

oleg-stable ~ #  hostname
oleg-stable.host.funtoo.org

Now we see our fully qualified domain name hostname just how we configured it in /etc/conf.d/hostname in above paragraph. To get a short hostname we need to set -s (short) argument to hostname command.

oleg-stable ~ #  hostname -s
oleg-stable

Good! Hostname offers more then just displaying a system host name but can also set one. Let's try:

oleg-stable ~ #  hostname foo.bar.baz
oleg-stable ~ #  hostname 
foo.bar.baz

As you can see, we changed a hostname on-the-fly. This is not recommended way.

   Warning

Please, notice that using hostname command to configure will work temporary for a current session and will be reverted back to a value set in /etc/conf.d/hostname file with next system restart.

Now that we got a brief description of a hostname and basic configuration steps, its time to reflect another important case which is directly related to a Funtoo Linux hostname generation, a hosts.

Hosts case

As per man page hosts stands for static table lookup for hostnames and it's configuration file is /etc/hosts. Here is how it looks

   /etc/hosts
# Auto-generated hostname. Please do not remove this comment.
127.0.0.1       oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
::1             oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain

As you can see it has entries from our /etc/conf.d/hostname. As you may have guessed, in Funtoo Linux /etc/hosts file entries are auto-generated, when OpenRC hostname service starts. Previously, it is used to edit /etc/hosts manually. In Funtoo Linux there is no such need.

What about custom hosts entries? This can be easily configured with 'aliases'. For example you want to have a hosts for your remote router or a computer in home LAN. Let's try to modify /etc/conf.d/hostname with adding following - my remote computer oleg.distant.home has an IP 10.1.1.2:

oleg-stable ~ #  echo 'aliases="10.1.12 oleg.distant.home"' >> /etc/conf.d/hostname
oleg-stable ~ #  service hostname restart
Examine our changes:
oleg-stable ~ #  cat /etc/hosts
   /etc/hosts
# Auto-generated hostname. Please do not remove this comment.
10.1.1.2        oleg.distant.home
127.0.0.1       oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain
::1             oleg-stable.host.funtoo.org oleg-stable localhost localhost.localdomain