Advertisements

A Brief Overview of Elasticsearch

This is the era of NoSQL databases. Elasticsearch is one of the popular NoSQL databases. It is a document database. It stores records as JSON. Once you get the basics of elastic search, it is very easy to work with elasticsearch. You can become a master in elasticsearch in just few days. You should know about JSON for proceeding with elasticsearch. Here I am explaining the quick installation and basic operations in elasticsearch. This may help some beginners to just start with this. For learning elasticsearch, you don’t need any server. You can use your desktop/laptop for installling elasticsearch.

Step 1:
Download the elasticsearch installable from elasticsearch website
https://www.elastic.co/downloads/elasticsearch

Extract the file. Go to the bin folder and execute the elasticsearch script.
Linux users should execute the elasticsearch script and windows users should execute the elasticsearch.bat file
Now your elasticsearch will be up and by default the data will be stored by under the folder $ELASTICSEARCH_HOME/data.
Check the url http://localhost:9200 in your browser. If some json like below is coming means your elasticsearch instance is running properly

{
"status" : 200,
"name" : "Armor",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.4.0",
"build_hash" : "bc94bd81298f81c656893ab1ddddd30a99356066",
"build_timestamp" : "2014-11-05T14:26:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}

Step 2:

Now we have to load some data to elasticsearch. For that we need to send some REST requests.
Install the REST Client plugin in your browser. (In mozilla firefox, a plugin named REST CLIENT will help you. In chrome, a plugin named POST MASTER will help you).

Step 3:

Get some sample data for loading. I am providing some sample data. You can use something else also. In elasticsearch, we keep data under an index. An index is analogous to a database in RDBMS. In RDBMS, we have databases, tables and records. Here we have Index, Types and Documents. All the documents will have a unique key called Id.

Here I am adding car details to an index “car”, type “details”. I am giving the Ids from 1.
For adding the first record. Send a PUT request with the following details

URL : http://localhost:9200/car/details/1
METHOD : PUT
BODY:
{
"carName": "Fabia",
"manufacturer": "Skoda",
"type": "mini"
}

Similarly you can add the second record

URL : http://localhost:9200/car/details/2
METHOD : PUT
BODY:
{
"carName": "Yeti",
"manufacturer": "Skoda",
"type": "XUV"
}

The full dataset is available in github. You can add the complete records like this.

Step 4:

If you want to update a record, just do a PUT request similar to data load with the corresponding ID and new record. It will
update the data with new record and you can see a new version number. The old record will not be available after this.

Eg: Suppose If I want to change the record with Id 1. I am changing the car name from Fabia to FabiaNew.

URL : http://localhost:9200/car/details/1
METHOD : PUT
BODY:
{
"carName": "FabiaNew",
"manufacturer": "Skoda",
"type": "mini"
}

Step 5:
For getting all the indexes from elasticsearch. Do the following GET request.

METHOD: GET
http://localhost:9200/_aliases

Step 6:
For getting the data from elasticsearch, we can query elasticsearch. For getting everything from elasticsearch, do the following query.

METHOD: GET
http://localhost:9200/_search
http://localhost:9200/car/_search

Step 7:

For detailed queries you can try the following.

Query for getting all the vehicles with manufacturer “Skoda”.

METHOD: POST
http://localhost:9200/_search

{"query":

{
"query_string" : {
"fields" : ["manufacturer"],
"query" : "Skoda"
}
}
}

Query for getting all the vehicles with manufacturer Skoda or Renault

METHOD: POST
http://localhost:9200/_search

{"query":
{
"query_string" : {
"fields" : ["manufacturer"],
"query" : "Skoda OR Renault"
}
}
}
Advertisements

Simple Sentence Detector and Tokenizer Using OpenNLP

Machine learning is a branch of artificial intelligence. In this we  create and study about systems that can learn from data. We all learn from our experience or others experience. In machine learning, the system is also getting learned from some experience, which we feed as data.

So for getting an inference about something, first we train the system with some set of data. With that data, the system learns and will become capable to give inference for new data. This is the basic principal behind machine learning.

There are a lot of machine learning toolkits available. Here I am explaining a simple program by using Apache OpenNLP. OpenNLP library is a machine learning based toolkit which is made for text processing. A lot of components are available in this toolkit. Here I am  explaining a simple sentence detector and a tokenizer using OpenNLP.

Sentence Detector

Download the en-sent.bin from the Apache OpenNLP website and add this to the class path.


public void SentenceSplitter()
	{
	SentenceDetector sentenceDetector = null;
	InputStream modelIn = null;

	try {
       modelIn = getClass().getResourceAsStream("en-sent.bin");
       final SentenceModel sentenceModel = new SentenceModel(modelIn);
       modelIn.close();
       sentenceDetector = new SentenceDetectorME(sentenceModel);
	}
	catch (final IOException ioe) {
		   ioe.printStackTrace();
		}
	finally {
		   if (modelIn != null) {
		      try {
		         modelIn.close();
		      } catch (final IOException e) {}
		   }
		}
	String sentences[]=(sentenceDetector.sentDetect("I am Amal. I am engineer. I like travelling and driving"));
	for(int i=0; i<sentences.length;i++)
	{
		System.out.println(sentences[i]);
	}
	}

Instead of giving sentence inside the program, you can give it as an input file.

Tokenizer

Download the en-token.bin from the Apache OpenNLP website and add this to the class path.

public void Tokenizer() throws FileNotFoundException
     {
	//InputStream modelIn = new FileInputStream("en-token.bin");
	InputStream modelIn=getClass().getResourceAsStream("en-token.bin");
		try {
			  TokenizerModel model = new TokenizerModel(modelIn);
			  Tokenizer tokenizer = new TokenizerME(model);
			  String tokens[] = tokenizer.tokenize("Sample tokenizer program using java");

			  for(int i=0; i<tokens.length;i++)
				{
					System.out.println(tokens[i]);
				}
			}
			catch (IOException e) {
			  e.printStackTrace();
			}
			finally {
			  if (modelIn != null) {
			    try {
			      modelIn.close();
			    }
			    catch (IOException e) {
			    }
			  }
			}
	}