Difference between pages "Java Configuration Design Update" and "Web-server-stack"

(Difference between pages)
 
m (Benchmarking: MOAR)
 
Line 1: Line 1:
This page describes a potential update to the Java support code in Gentoo and Funtoo Linux, with the intention of simplifying <tt>java-config</tt> code and also making it more correct and better integrated with Portage itself. The proposal intends to address deficiencies in the current design of <tt>java-utils-2.eclass</tt> while maintaining its useful features and avoiding a full rewrite. Rather than replace the current system with more code, the goal is to simplify the existing code while retaining functionality, while allowing a graceful migration of advanced functionality inside Portage itself.
 
  
{{fancytip|Please post comments on the [[{{TALKPAGENAME}}|Discussion]] page.}}
+
== Pre-install considerations ==
 +
=== ssl ===
 +
Ssl [http://en.wikipedia.org/wiki/Wildcard_certificate wild card certificates] can use the same certificate to cover several subdomain names.  As in https://wiki.funtoo.org https://www.funtoo.org https://forums.funtoo.org can all use the same certificate.  https://funtoo.org would not be covered under the wildcard, so [[User:Threesixes|Threesixes]] ([[User talk:Threesixes|talk]]) suggests using http://domain.tld as a http navigation splash page directory.  All that is required to setup a ca signed ssl certificate is an email on the server.  https://www.startssl.com offers free ssl ca certificates, though there are several other certificate [http://en.wikipedia.org/wiki/Certificate_authority#Providers providers]. Many web apps require you set your URL & will have problems if your URL is set to http://, rather than https://
  
== Design Challenges ==
+
=== sockets vs tcp stack ===
 +
Sockets have less overhead but can not be shared across jails, or to other machines.  Tcp stack has more overhead but is far more flexible.
  
The current Java eclasses have some very interesting and powerful features, yet are lacking in some areas.
+
=== Email Servers ===
 +
* {{Package|mail-mta/postfix}}  <-- suggested
 +
* {{Package|mail-mta/ssmtp}}
 +
* {{Package|mail-mta/exim}}
 +
* {{Package|mail-mta/sendmail}}
 +
* {{Package|mail-mta/nullmailer}}
  
=== Dependency Handling ===
+
=== FTP Servers ===
 +
It is common practice to use FTP servers to host files for downloading.
  
Currently, <tt>java-utils-2.eclass</tt> uses <tt>depend-java-query</tt> to automatically select the most optimal Java VM for building. This is a sophisticated feature that is intended to be 'smart', yet it has some flaws. These flaws are fixable:
+
* {{Package|net-ftp/vsftpd}} <-- suggested
 +
* {{Package|net-ftp/proftpd}}
 +
* {{Package|net-ftp/pure-ftpd}}
 +
* {{Package|net-ftp/qshare}}
  
* While JVM auto-selection is smart, it is not always correct -- it doesn't use Portage's API, but instead uses regexes to parse the currently-executing ebuild's <tt>DEPEND</tt> string.  
+
== Webserver ==
 +
Web servers come in several varieties.  The most common stack is known as LAMP which stands for linux apache mysql php.  [[User:Threesixes|Threesixes]] ([[User talk:Threesixes|talk]]) suggests setting up the web server stack by selecting the database first, then scripting language second, and web server 3rd.
  
* Related to the regex issue, the code in <tt>depend-java-query</tt> is complex as it tries to duplicate some functionality that could be handled better by Portage's API.
+
=== Databases ===
 +
* {{Package|dev-db/mysql}}
  
This proposal provides a mechanism for the Portage API to be used to always generate a completely correct result. This may seem like a minimal optimization since the existing code is correct ''most'' of the time. But it is duplicative, in that it does a rough approximation of what the Portage API could do absolutely correctly. So there is no good reason to keep it -- less code means more maintainability.
+
mariadb is a drop in replacement for mysql
 +
* {{Package|dev-db/mariadb}} <-- suggested
  
{{fancyimportant|Moreover, there is even a better reason for doing things the right way -- eclasses should as much as possible be able to behave as "first-class citizens" in the world of Portage. This is an important architectural goal, and there are severe costs for not pursuing it, namely having every eclass under the sun re-implement various parts of Portage, albeit poorly, creating much uncessary code and frustration!}}
+
percona is a drop in replacement for mysql
 +
* {{Package|dev-db/percona-server}}
  
=== Atoms ===
+
* {{Package|dev-db/postgresql-server}}
 +
* {{Package|dev-db/sqlite}}
  
Currently, <tt>eselect java-vm</tt> and <tt>java-config</tt> have their own "atoms" that they use to identify Java virtual machines, which seems to be <tt>${PN}-${SLOT}</tt>. Yet Portage already has atoms, and we should try to allow support for them over time. This proposal does require there to be some "link" between the existing JVM atom and the Portage atom. This link currently does not exist.
+
=== Languages ===
 +
* {{Package|dev-lang/php}} <-- suggested
 +
* {{Package|dev-lang/perl}}
 +
* {{Package|dev-lang/python}}
  
== Proposal ==
+
=== Web Servers ===
 +
* {{Package|www-servers/apache}}
 +
* {{Package|www-servers/cherokee}}
 +
* {{Package|www-servers/nginx}} <-- suggested
 +
* {{Package|www-servers/tengine}}
 +
* {{Package|www-servers/lighttpd}}
  
=== java-config Settings ===
 
  
This proposal involves a minor upgrade to the java-config settings for a JVM -- adding a <tt>PORTAGE_ATOM</tt> setting to the file, which allows the <tt>java-config</tt> atom to be linked to the Portage atom:
+
=== SSL Termination, Reverse Proxies, & load balancing ===
 +
Reverse proxies are useful, some cache static data, and shuck out cached pages rather than hitting the web server.  Some pass requests to backend nodes high availability clustering your website, some web servers have this functionality built in.
  
{{file|name=/usr/share/java-config-2/vm/oracle-jdk-bin-1.7|desc=Java-config settings|lang=bash|body=
+
* {{Package|www-servers/nginx}}
# Copyright 1999-2011 Gentoo Foundation
+
* {{package|net-misc/stunnel}}
# Distributed under the terms of the GNU General Public License v2
+
* {{package|www-servers/pound}} <-- suggested for ssl termination & load balancing
# $Header: /var/cvsroot/gentoo-x86/dev-java/oracle-jdk-bin/files/oracle-jdk-bin-1.7.env,v 1.2 2011/11/17 22:49:56 caster Exp $
+
* {{Package|www-servers/varnish}} <-- suggested for caching to reduce power consumption & reduce the need of constantly rebuilding pages
 +
* {{Package|net-proxy/squid}}
  
VERSION="Oracle JDK 1.7.0.60"
+
== Post install ==
JAVA_HOME="/opt/oracle-jdk-bin-1.7.0.60"
+
There are several considerations to take into account with a web server install, such as setting up an email server, setting up a firewall, firewalling web applications, and dynamically firewalling attackers.
JDK_HOME="/opt/oracle-jdk-bin-1.7.0.60"
+
JAVAC=${JAVA_HOME}/bin/javac
+
PATH="${JAVA_HOME}/bin:${JAVA_HOME}/jre/bin"
+
ROOTPATH="${JAVA_HOME}/bin:${JAVA_HOME}/jre/bin"
+
LDPATH="${JAVA_HOME}/jre/lib/amd64/:${JAVA_HOME}/jre/lib/amd64/xawt/:${JAVA_HOME}/jre/lib/amd64/server/"
+
MANPATH="/opt/oracle-jdk-bin-1.7.0.60/man"
+
PROVIDES_TYPE="JDK JRE"
+
PROVIDES_VERSION="1.7"
+
BOOTCLASSPATH="${JAVA_HOME}/jre/lib/resources.jar:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/jre/lib/sunrsasign.jar:${JAVA_HOME}/jre/lib/jsse.jar:${JAVA_HOME}/jre/lib/jce.jar:${JAVA_HOME}/jre/lib/charsets.jar:${JAVA_HOME}/jre/classes"
+
GENERATION="2"
+
ENV_VARS="JAVA_HOME JDK_HOME JAVAC PATH ROOTPATH LDPATH MANPATH"
+
VMHANDLE="oracle-jdk-bin-1.7"
+
BUILD_ONLY="FALSE"
+
PORTAGE_ATOM="dev-java/oracle-jdk-bin-1.7.0.60"
+
}}
+
  
With this very minor and easy-to-implement change, new possibilities are available -- namely, for <tt>java-config</tt> code to tap into the Portage API and leverage its advanced dependency handling functionality.
+
=== Firewalls ===
 +
* {{Package|net-firewall/nftables}}
 +
* {{Package|net-firewall/iptables}} <-- suggested
 +
* {{Package|net-firewall/firewalld}}
 +
* {{Package|net-firewall/ufw}}
  
<div style="float: right; width: 40%; margin-left: 1em; margin-top: 1em; margin-bottom: 1em; background-color: #f8f8f8; padding: 0.5em; border-radius: 8px;">
+
=== Dynamic Firewalling ===
 +
* {{Package|app-admin/sshguard}} <-- suggested
 +
* {{Package|net-analyzer/fail2ban}}
  
=== Provides ===
+
=== Webapp Firewalls ===
  
{{fancynote|This section is not a requirement for the implementation of this proposal, but a suggestion for future Portage development.}}
+
Apache has an option for webapplication firewalling.  as far as [[User:Threesixes|Threesixes]] ([[User talk:Threesixes|talk]]) can tell this passes login errors & excessive site fuzzing to logs for fail2ban/sshguard to deal with.
 +
https://github.com/nbs-system/naxsi is a web app firewall for nginx.
  
Currently, the functionality that is provided by each JVM is stored in the <tt>java-config</tt> file shown above, namely in the <tt>PROVIDES_TYPE</tt> and <tt>PROVIDES_VERSION</tt> variables. This works for us, though it is unfortunate that the (in my opinion) not completely thought-out [http://wiki.gentoo.org/wiki/GLEP:37 GLEP 37] was implemented, which deprecated the <tt>PROVIDES</tt> variable in Portage. This is a useful variable because it gave us a "link" from the installed package to the virtual it provided. No such link currently exists, which is a reduction in useful functionality.
+
=== Benchmarking ===
 +
It's a good idea to benchmark your system, server, & websites. There are several tools to assist you in doing this.
  
It is recommended that <tt>PROVIDES</tt> is un-deprecated in Portage, and used by ebuilds solely for recording what virtuals are provided by the ebuild, so that they can be queried later once the package is installed. This would allow <tt>PROVIDES_TYPE</tt> and <tt>PROVIDES_VERSION</tt> -- functionality duplicated by the Java tools -- to be removed, further reducing the complexity of the Java tools codebase.
+
* http://toolbar.netcraft.com/site_report?url=undefined#last_reboot
</div>
+
* http://gtmetrix.com/
 
+
* http://www.showslow.com/
=== java-config Dependency Handling ===
+
* http://yslow.org/
 
+
* http://getfirebug.com/
With these changes, <tt>java-config</tt> and <tt>depend-java-query</tt> to tap directly into the power of Portage dependency handling. Let's see how.
+
* {{Package|app-admin/apache-tools}}
 
+
* {{Package|app-benchmarks/sysbench}}
=== Query: List installed VMs ===
+
* {{Package|app-benchmarks/phoronix-test-suite}}
 
+
* {{Package|app-benchmarks/iozone}}
To list installed VMs, <tt>java-config</tt> could look in <tt>/usr/share/java-config-2/vm/</tt> to get a list of all installed Java VMs. However, thanks to the <tt>PORTAGE_ATOM</tt> variable, it can then compile a list of corresponding package atoms in <tt>/var/db/pkg</tt>.
+
* {{Package|app-benchmarks/piozone}}
 
+
* {{Package|app-benchmarks/siege}}
=== Query: Is a Suitable or Best VM Selected? ===
+
* {{Package|app-benchmarks/ramspeed}}
 
+
* {{Package|app-benchmarks/jmeter}}
Currently, the <tt>java-utils-2.eclass</tt> attempts to magically select the proper VM for building based on the contents of the <tt>DEPEND</tt> string. This is how this functionality would be implemented correctly, using the Portage API rather than custom code.
+
 
+
To perform this query correctly and efficiently, <tt>depend-java-query</tt> can compile a list of Portage atoms for all installed Java VMs, along with what virtual they provide using <tt>PORTAGE_ATOM</tt>, <tt>PROVIDES_TYPE</tt> and <tt>PROVIDES_VERSION</tt>. Then, the <tt>DEPEND</tt> handed to it can be correctly evaluated using Portage functions. This is how the Portage side of the algorithm would work:
+
 
+
# Pre-process <tt>DEPEND</tt> string:
+
## Remove all components that are not related to <tt>virtual/jdk</tt> and <tt>virtual/jre</tt>.
+
## Evaluate <tt>DEPEND</tt> string using currently-active USE settings to remove all conditionals.
+
# Create a <tt>fakedbapi</tt> and <tt>cpv_inject()</tt> the current system VM Portage atom into it.
+
# Evaluate the pre-processed <tt>DEPEND</tt> string and see if the dependencies are satisfied.
+
# If so, a suitable VM is selected.
+
 
+
To find the "best match" VM, a similar process would be followed. Instead of <tt>cpv_inject</tt>ing the current system VM, ''all'' installed VMs would be injected and a best match would be found.
+
 
+
== Future Directions ==
+
 
+
=== Reusing Code, Merging into Portage ===
+
 
+
Since this functionality could use useful to other eclasses, this could be used as a test for a more generic helper function that could be integrated into Portage. This would allow other eclasses to implement similar functionality without having to have their own custom helper applications.
+
 
+
The <tt>java-config</tt> settings file could be merged into <tt>/var/db/pkg</tt> and a simple API could be added to <tt>vartree</tt> and <tt>dblink</tt> to provide API access to it. This would provide a toolkit for eclasses for storing extra configuration settings with installed ebuilds, which could be useful for a variety of purposes.
+
 
+
All these goals support the idea of code re-use and maintainability, and addressing the problem at the correct architectural level.
+
 
+
[[Category:Portage]]
+
[[Category:FLOP]]
+
[[Category:Java]]
+
__NOEDITSECTION__
+

Latest revision as of 13:55, January 18, 2015

Pre-install considerations

ssl

Ssl wild card certificates can use the same certificate to cover several subdomain names. As in https://wiki.funtoo.org https://www.funtoo.org https://forums.funtoo.org can all use the same certificate. https://funtoo.org would not be covered under the wildcard, so Threesixes (talk) suggests using http://domain.tld as a http navigation splash page directory. All that is required to setup a ca signed ssl certificate is an email on the server. https://www.startssl.com offers free ssl ca certificates, though there are several other certificate providers. Many web apps require you set your URL & will have problems if your URL is set to http://, rather than https://

sockets vs tcp stack

Sockets have less overhead but can not be shared across jails, or to other machines. Tcp stack has more overhead but is far more flexible.

Email Servers

FTP Servers

It is common practice to use FTP servers to host files for downloading.

Webserver

Web servers come in several varieties. The most common stack is known as LAMP which stands for linux apache mysql php. Threesixes (talk) suggests setting up the web server stack by selecting the database first, then scripting language second, and web server 3rd.

Databases

mariadb is a drop in replacement for mysql

percona is a drop in replacement for mysql

  • dev-db/percona-server (package not on wiki - please add)
  • dev-db/postgresql-server (package not on wiki - please add)
  • dev-db/sqlite (package not on wiki - please add)

Languages

Web Servers


SSL Termination, Reverse Proxies, & load balancing

Reverse proxies are useful, some cache static data, and shuck out cached pages rather than hitting the web server. Some pass requests to backend nodes high availability clustering your website, some web servers have this functionality built in.

  • Nginx
  • net-misc/stunnel (package not on wiki - please add)
  • www-servers/pound (package not on wiki - please add) <-- suggested for ssl termination & load balancing
  • Varnish <-- suggested for caching to reduce power consumption & reduce the need of constantly rebuilding pages
  • Squid

Post install

There are several considerations to take into account with a web server install, such as setting up an email server, setting up a firewall, firewalling web applications, and dynamically firewalling attackers.

Firewalls

  • net-firewall/nftables (package not on wiki - please add)
  • Iptables <-- suggested
  • net-firewall/firewalld (package not on wiki - please add)
  • net-firewall/ufw (package not on wiki - please add)

Dynamic Firewalling

Webapp Firewalls

Apache has an option for webapplication firewalling. as far as Threesixes (talk) can tell this passes login errors & excessive site fuzzing to logs for fail2ban/sshguard to deal with. https://github.com/nbs-system/naxsi is a web app firewall for nginx.

Benchmarking

It's a good idea to benchmark your system, server, & websites. There are several tools to assist you in doing this.