Posts tagged with 'Code'

Code, Apps and Design Principles

  • Posted on November 17, 2012 at 4:13 pm

You probably know the term eye candy or dancing pigs. It applies to applications („apps“) running on computing platforms. It especially applies to apps running on devices that can be thrown easily. Widgets, nice colours and dancing pigs beats a sound security design every time. Since this posting is not about bashing Whatsapp (it’s really about sarcasm), here’s a list of advice for app „developers“.

  • If you in need of unique identifiers (UIDs), please always use information that can never be guessed and are very hard to obtain. Telephone numbers, names, e-mail addresses and device identifiers are closely guarded secrets which will never be disclosed, thus this is a very good choice.
  • If you are in the position of having to use easily guessable information for unique identifiers, make sure you scramble the information appropriately. For example you can use the MAC address of network devices, you just have to use an irreversible function on it. MD5(MAC) yields a result that can never be mistaken by a MAC address and cannot be reversed, so it is totally safe.
  • Everything you cannot understand is very safe. This is why you should never take a closer look at algorithms. Once you understand them, the attacker will do so, too. Then your advantage is lost. Make sure you never know why you are selecting a specific algorithm regardless of its purpose.
  • Always try to invent your own protocols. Since every protocol in existence was invented by amateurs with too much time on their hands, you can always do better.
  • Never reuse code. Libraries are for lazy bastards who cannot code. Rewrite every function and framework from scratch and include it into your software.
  • Put the most work into the user interface. If it looks good, the rest of the code automatically becomes very secure and professional. This goes for encryption as well. Most encryption algorithms can be easily replace by the right choice of colours.
  • Getting reverse engineered is never a problem if you explicitly forbid it in the terms of usage. Everyone reads and accepts this.
  • Aim for a very high number of users. Once you hit the 100,000 or 1,000,000 users, your software will be so popular that no one will ever attack it, because, well,  it’s too popular. Accept it as a fact, it’s too complicated to explain at this point.

Go and spread the word. I can’t wait to see more apps following these simple principles to be available in the app stores all over the world.

 

Growing up in a Hacker Space without knowing about it

  • Posted on April 11, 2012 at 2:26 am

This blog posting is a bit different from all the others. Usually it”s more about sarcasm or bashing things or people. Today it is the complete opposite. If you look at the tag cloud and have read some postings (or tweets for that matter) you probably have realised that I am doing some hacking behind the scenes. Let’s call it tinkering with technology. Basically I learnt a lot because my family allowed me to learn and to develop skills. Let me tell you how this was like.

I’ve always been the curious type. I constantly tried to figure out how things work, even as a child. Most children do that, but I liked to take apart gadgets very early. The curiosity grew intense. My parents and grandparents forbade me to open any household gadget that was new or still in use. Back in the days appliances were repaired, not replace. So one of my chances to get a peek inside was to wait until something broke and a repairman (be it an electrician, a plumber or heating contractor) came to our house. I was happy whenever our TV set was broken, because I got a look inside and could observe what the electrician did. I always kept the circuit diagrams of our device although I couldn’t read them properly yet. Those were part of the manual (I grew up in the age before „intellectual property“ was invented out of thin air, people were still allowed to repair their own possessions back then).

My family recognised my curiosity. I got lots of books. I read them. My grandfather gave his support also by buying science kits for me. One chemistry set, two physics sets and countless of electronic kits found their way to our home. I had lots of electronic components ranging from transistors, coils, transformers, capacitors, LEDs (yes, only the red ones), LED displays, a cathode ray tube, a 10 MHz oscilloscope, soldering iron, cable and countless of other items. First I build the experiments according to the manual (building test circuits up to sound generators, radios and even a simple black/white TV set), then I started to try my own ideas. I could even use my grandfathers work shop in the basement. He was a mechanic, and his work shop had anything – screwdrivers of any size, power drills, files, soldering lamps, paint, solvent, piece of metal, pipes, really anything. And I could use all of these tools whenever I wanted to.

Some Christmas day (guess it was 1984) the electronic kit collection turned digital. My grandfather gave me a BUSCH Elektronik’s Microtronic Computer-System 2090 with a 4-bit TMS 1600 CPU at its core. 4096 Byte ROM, 64Byte + 512 Byte RAM, 40 assembler instructions and 12 commands at the console consisting of 26 keys and a 6-digit LED display greatly enhanced the capabilities of my little lab. I started coding. The series of presents from my grandfather continued with a Commodore C64, a C128 and Amiga 500/2000/4000, not to forget the HP48 calculator I used at university.

I am not writing this down to brag about it, I am well aware that not everyone has been lucky to have a family like this. The point is this: Even when my grandfather gave me the electronic kits he did not understand what I was doing with it. He had a basic understanding of electricity, he could fix electrical wiring in the house, but he never did more complex things. He was a master mechanic, he could build anything out of wood or metal. Despite having no interest in and knowledge of electronics and computing he tried to help me with my education. Growing up with books, hardware, software and a work shop – and with an environment that actively supports curiosity – is one of the best things that can happen to you. That’s what a hacker space is – the best that can happen to you. Cherish it! Support it! Improve it! Create it if it doesn’t exist! And always put the tools back to where they belong! My grandfather told me this over and over.

Sadly I cannot thank my grandfather any more. He did a couple of years ago. He would have turned 90 today.

If you want to do him a favour, then please create something or understand the workings of Nature. He would have liked it.

Ruby on Rails – continued madness

  • Posted on January 16, 2011 at 4:41 pm

I am still at odds with Ruby and Rails, but I know now why the stupid gems are not available as Debian packages. The Ruby people insist on using their Gem as package manager. Debian has already a very capable package manager. Having two package manager is a bad idea. Other programs seem to manage well. Take Perl for examle. Perl has CPAN, but you can build a Debian package out of any module found on CPAN. It remains a mystery why the Ruby Gems refuse to work this way.
It seems fair to warn everyone intending to use Ruby: „Please install this crap from source and forget about your distribution’s package manager!”

Speaking of gems, here’s another one. I have recreated the production environment in order to simulate the intended upgrade path. I have installed Rails 3.0.3 instead of the Rails 2.1.2 package on the production environment. I just want to try what happens. After changing the version number in the environment.rb file of the application I get the no such file to load — initializer error message. Great, a file or some files are missing, but no mentioning of the missing files. Checking the logs reveils that the Rails 2.1.2 gem is missing. Good thing we’re back to the „you’ve got to have multiple versions of the same gem installed, just to feel good about it” mindset. I am a fool to assume that high level languages are designed to protect sysadmins and developers from the peculiarities of specific versions of the components involved.

Update: Ok, finally I found the strategy I wanted to avoid. Remove all Ruby Debian packages, get the Ruby Enterprise Edition (which is basically a fork of Ruby in order to incorporate sanity) and install it. Kudos to Debian for not letting this crap on board, kudos to the REE developers for delivering this solutions, and a heartfelt and sincere fuck you to Ruby!

True Silent Night

  • Posted on December 25, 2010 at 1:09 am

I logged off early yesterday and will do so in a moment. I just had to write one more e-mail and read another one. That’s why I got stuck in the blog editor. Finally I can appreciate the silence of the night. I’m a night person. I wear the cloak of the night as a shield, just because the noise of the day stops. And so this is a truly silent night today for next to anyone has stopped and went away. The only thing that’s missing is snow. That would be perfect. Maybe next year then.

Speaking of silence, I nearly finished the prototype of the XMPP (Jabber protocol) robot I am trying to code. Having software communicate via XMPP has its advantages. A side effect is to explore the C++ gloox library which is really a fantastic piece of code. Getting a client to work doesn’t take much time. You just have to extend your client class with the funtionality you need and you’re done. Another nice thing about gloox and XMPP is that SSL/TLS is already included (provided you use XMPP servers that are configured to use SSL/TLS). Coding will continue on Monday.

Now is the time to enjoy the silence.

Rain!

  • Posted on July 15, 2010 at 9:51 pm

Two good news: a) It’s raining. b) I dumped Xalan/Xerces in favour of libxslt from the GNOME project.

XML Parsers considered harmful

  • Posted on July 6, 2010 at 8:08 pm

I am fighting with Xalan and Xerces (in C++). After looking for decent tutorials (there are none) I found this little gem among the Google results. It clearly emphasises what I already know. XML parsers are from Hell. Xalan & Xerces are especially tricky since they’ve been ported from Java. The API is a bit weird. Some things contradict intuition. For example if you initialise the transformation engines more than once per process run, the destructor for the XSLTInputSource crashes with SIGSEGV. You get no clue. You return() and just as the objects get out of scope your program crashes. The secret is hidden in the API documentation. And you cannot easily stop the XSLT transformer from downloading/accessing the document’s DTD. You have to provide your own EntityResolver class that resolves all entities without DTD, if you wish to ignore it. Charming. Bureaucratic. Have I mentioned Java already?

Google result hit for XML parser software with a malware warning.

XML considered harmful.

If you know a decent and light-weight XSLT transformation library, let me know. I just need to delete tags from HTML, XHTML and XML documents (which worked well with regular expressions before). The XSLT template is quite short, and the task isn’t very complicated.

O_NOATIME + NFS = EPERM

  • Posted on June 28, 2010 at 12:26 am

I met a surprise today. I am writing code that accesses a lot of files via NFSv4, stat()s  them, possibly extracts content and writes stuff into a couple of databases. Somewhen in the debug/development cycle a stat() call returned Resource temporarily unavailable (a.k.a. EAGAIN and EWOULDBLOCK). I tried replacing stat() by lstat() and finally by fstat() in order to assert more control over the flags provided to open(). The combination O_RDONLY | O_SYNC | O_NOATIME changed EAGAIN into EPERM (Operation not permitted). Why is that? Well, here’s a hint.

The O_NOATIME flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged (CAP_FOWNER).

Correct. I changed the machine the test ran on. This turned the effective UID into something different (the NFS share showed the numerical 4294967294 which is not the UID of the development account). I’d never have expected this behaviour from the description in the man pages…despite the quoted sentence above…which is really part of man 2 open

RTFM. Again.

Text Decoding – Progressing

  • Posted on June 13, 2010 at 9:27 pm

The new document class takes form. I now have a prototype that can extract meta information from the filesystem (not that hard), detect/guess the encoding in case of pure text files (with the help of the ICU library), strip markup language (by replacing all tags) and detect/convert PDF documents (with the help of the PoDoFo PDF library). Converting HTML to text is a bit of a hack in C++. Maybe it is easier to use XSLT and let a XML library do the transformation as a whole. In theory HTML was built for this. However I still need to strip the tags of unknown XML documents in case someone wants to index XML stuff.

I forgot to extract the meta information from the HTML header. RegEx++; Dublin Core or non-standard tags, that is the question.

  • Comments are off for Text Decoding - Progressing
  • Filed under

Texthaufen, Code und Regen

  • Posted on May 30, 2010 at 3:40 pm

Der Mai verging schneller als geplant. Ich habe daher nicht so viel vom Regen mitbekommen wie andere. Abgesehen davon war das der angenehmste Mai seit langem. Wer die Kühle der hessischen Wälder kennt, der wird vom Wüstenklima in Wien sehr unangenehm überrascht. Ich ziehe satte 17°C allem über 25°C jederzeit vor. Leider stehen für nächste Woche 30°C an…

Der Code zum Indizieren von Texthaufen ist gewachsen und wurde mit Korpi von 60000+ Dokumenten getestet. Die Erkenntnisse haben zur Beseitigung einiger Bugs geführt. Alle, die bisher dachten, daß der Unrat auf Dateiservern aus wohldefinierten und zugänglichen Dokumenten besteht, sollten diese Einstellung dringend hinterfragen. Dateiformate wie PDF, ODT, ODP oder ODS sind sehr gut zugänglich und meisten auch in eine indizierbare Form wandelbar. Dicht gefolgt ist dann XLS und PostScript®. Bei DOC kann es schon passieren, daß es statt DOC ein Text in RTF ist, aber die Dateierweiterung das nicht anzeigt. Dann gibt es noch DOC Dateien, die per Cut & Paste mit Text in einer seltsamen Kodierung gefüllt wurden. Es resultiert nach Normalisierung ein Text, der sich nicht in UTF-8 konvertieren läßt. Überhaupt ist die Kodierung ein großes Problem, da TXT und HTM(L) Dateien die verwendete Kodierung selten bis nie angeben. Genau aus diesem Grund haben Webbrowser Code an Bord, der Kodierungen errät.

Dateiformate sind das nächste Problem. Der Indexer wandelt alle interessanten Dokumente in reinen Text, da nur dieser indiziert wird. Es gibt nicht für alle Formate kommandozeilenbasierte Konverter. OOXML fällt mir spontan ein, dicht gefolgt von proprietären e-Book-Formaten. Solche Formate fallen derzeit durch den Rost.

Hört ihr Leut’ und laßt euch sagen, Textformate lassen mich verzagen. Bisher sind PDF, PostScript® und die OpenOffice Formate meine Favoriten.

Textabenteuer

  • Posted on May 18, 2010 at 1:31 pm

Der Mai ist dichter gepackt als ich dachte. Vorletzte Woche habe ich bei den Wiener Linuxwochen im Alten Rathaus verbracht. GNU/Linux gepaart mit barocker Architektur sieht man nicht alle Tage. Aus Denkmalschutzgründen gab es daher nur ein Funknetzwerk. Jetzt widme ich mich wieder anderen Problemen und viel Text – in Form von Code und eigentlichem Text. Ich teste CLucene und den PostgreSQL Textindizierer an Dokumenten aus dem „echten” Leben. Die Fragen, die sich dabei aufwerfen, sind schwieriger zu beantworten als es die Dokumentation erahnen läßt.

Zuerst muß man mal auf den eigentlich Text kommen. Es gibt einen Haufen von Dokumentformaten – .doc, .pdf, .odt, .html, .xml, .rtf, .odp, .xls, .txt, .ps, … . Diese muß man zuerst normalisieren bevor man sie indizieren kann. Man benötigt den puren Text, die einzelnen Worte, und sonst nichts. Obendrein sollte die Kodierung der Zeichen einheitlich sein. Es bietet sich UTF-8 an, also muß man ausreichend Konverter haben. Da einige der Dokumentenformate proprietär oder einfach nur schlecht entworfen sind, ist das keine triviale Aufgabe. Ich habe genug Konverter gefunden, aber einige sind besser als andere. Die Tests an den Dokumentensammlungen werden zeigen wie gut sie wirklich sind.

Dann kommt die Sprache. Das Indizieren von Text reduziert die darin vorkommenden Worte auf ihre Stammform und entfernt Stopworte. Beides hängt von der Sprache des Dokuments ab. Nun wird die Sprache leider nicht in allen Formaten als Metainformation mitgegeben. Man muß sie also ermitteln. Dazu kann man sich der Publikation N-Gram-Based Text Categorization bedienen bzw. eine ihrer Implementationen bemühen. Was passiert mit Texten gemischter Sprache?

Die Liste ist lang. Der Code ist C++, und mir fehlt eine schöne, erweiterbare Klasse, die Dokumente einliest, sie in UTF-8 und puren Text normalisiert sowie einige Metainformationen ausliest. Bisher habe ich nichts gefunden, was ich verwenden möchte. Ich werde es selbst mal versuchen. HTML und XML kann ich schon normalisieren. Für PDF empfehle ich die exzellente PoDoFo Bibliothek (die Spenden von mir bekommen wird). Für den Rest suche ich noch.

Apropos Worte: Kennt wer die Sprache Yup’ik? Sie wird von sehr wenigen Inuit in Alaska und dergleichen gesprochen. Dort gibt es Worte, die andere Sprachen in Sätze fassen würden. Beispielsweise heißt Kaipiallrulliniuk soviel wie: „The two of them were apparently really hungry.” Faszinierend.

Thoughts about fsync() and caching

  • Posted on March 19, 2010 at 10:54 pm

I am currently reading stuff about a talk about caching and how developers (or sysadmins) reliably get data from memory to disk. I found this gem I want to share with you.

fsync on Mac OS X: Since on Mac OS X the fsync command does not make the guarantee that bytes are written, SQLite sends a F_FULLFSYNC request to the kernel to ensures that the bytes are actually written through to the drive platter. This causes the kernel to flush all buffers to the drives and causes the drives to flush their track caches. Without this, there is a significantly large window of time within which data will reside in volatile memory — and in the event of system failure you risk data corruption.

It’s from the old Firefox-hangs-for-30-seconds-on-some-systems-problem, described in the fsyncers and curveballs posting. Did you catch the first sentence? „Since on Mac OS X the fsync command does not make the guarantee that bytes are written”. This is a nice one, especially if programmers think that fsync() really flushes some buffers. It doesn’t always do that. And in case you want to be deprived of sleep, go and read the wonderful presentation titled Eat my data. It’s worth it.

Seriously Debugging the Text Indexer Code

  • Posted on February 28, 2010 at 4:55 pm

After feeling like wading in honey during the past weeks I finally get around to squash some bugs in my text indexer code. The first one was the obligatory use of a null pointer in rare cases. I know, this should never happen. Found it, squashed it. Won’t happen again (I am pretty confident about this).

The next problem was a wrong string comparison when dealing with file extensions. Ignoring the “.” leads to match of “ps” and “props”. The latter is no PostScript® file and cannot be indexed (well, it can be, but it shouldn’t). “.” are from now on never ignored.

The test data consists of 3755 files. After filtering 648 documents remain (file extensions .doc, .htm, .html, .odp, .ods, .odt, .ps, .pdf, .php, .rtf, .txt, .xml, .xls). The results are indexed by means of the PostgreSQL text index function. The resulting database has a table size of 488 kiB (23 MiB documents, 19 MiB text index). Indexing works fairly well so far. The database should be more than sufficient for testing the front end. I’ll probably have a go at the content of the two Cryptome.org DVDs I ordered a couple of weeks ago. Both DVDs contain 42914 files in 1106 directories. The total size is over 8 GiB. Maybe I publish the front end URL to the indexed Cryptome data. Let’s see.

  • Comments are off for Seriously Debugging the Text Indexer Code
  • Filed under

The Joy of High Level Languages

  • Posted on February 16, 2010 at 6:06 pm

When programming you should use a high level programming language. This is important since you do not ever again have to deal with the intricacies of the platform you are working on. Coding becomes paradise. And the Earth is flat, and pigs can fly. I’ve spent over ten days tracking down a problem of Awstats stopping to update the web statistics. The configuration was copied from the old server, as were all the logs, the previous configurations, everything. Yet Awstats did not generate new statistics.

Finally I found an unsuspecting line in the logs. It went: „Warning: Error while retrieving hashfile: Byte order is not compatible at ../../lib/Storable.pm” It’s just a warning, so it’s nothing to be worried about, right? And since we use a high level language surely the change from 32 bit to 64 bit Debian cannot make a difference, right? We code in high level, we do not deal with byte orders and other wordly stuff anymore. We are enlightened. And obviously we are fucked. Thanks to a hint on a blog somewhere the web statistics are working again.

I will continue my text indexer project today. It’s written in C++.

  • Comments are off for The Joy of High Level Languages
  • Filed under

Top