Install Apache Tika on Debian

  1. 1. Update java (your current java should be java7 or higher, if its already updated proceed to step 2.

    echo “deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main” | tee /etc/apt/sources.list.d/webupd8team-java.list

    echo “deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main” | tee -a /etc/apt/sources.list.d/webupd8team-java.list

    apt-key adv –keyserver hkp://keyserver.ubuntu.com:80 –recv-keys EEA14886

    apt-get update

    apt-get install oracle-java8-installer

    now check the java version: eg. java -version

    I should show like 

    java version “1.8.0_25”

    Java(TM) SE Runtime Environment (build 1.8.0_25-b17)

    Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

    If its still an old version you can follow the link below

    2. Donwload Tika

       b. Unzip:    eg. unzip tika-1.6-src.zip

    3. Intall Maven 2 for our Tika build system

       b. Unzip: eg. unzip apache-maven-3.2.3-bin.zip

       c. Follow the installation guide here-> http://maven.apache.org/download.cgi#Installation

    4. Install Tika

       a. Enter to tika base directory eg. cd tika-1.6

       b. build tika using maven2: eg mvn install

    5. Finish. Now you can test tika 

       a. From the base directory of tika you can do

          java -jar tika-app/target/tika-app-1.6.jar -j [file]  

          The output should be a json encoded metadata of a file

          For some options you can see http://tika.apache.org/1.6/gettingstarted.html or java -jar tika-app/target/tika-app-1.6.jar –help