Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Improvements (OS/ES support) #298

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
8f369fc
267 lucene 8_6_0 DoublePoint rename
patrick-austin Jan 28, 2022
3770bbf
Add getReadableIds and use in filterReadAccess #277
patrick-austin Feb 11, 2022
44a9d80
Implement Elasticsearch Java client functions #267
patrick-austin Mar 9, 2022
7bf55e9
Add lucene.searchBlockSize and refactor getReadable #277
patrick-austin Mar 9, 2022
d972057
Fix change to lucenePopulateBlockSize #277
patrick-austin Mar 11, 2022
17cd54d
ElasticsearchApi passing bare minimum tests 267#
patrick-austin Mar 17, 2022
939e421
Add documentation to altered methods #277
patrick-austin Mar 17, 2022
eed5d21
Merge branch '277_filterReadAccess_improvement' of https://github.com…
patrick-austin Mar 17, 2022
a22fe36
Enable field sorting for name and date fields #267
patrick-austin Mar 24, 2022
bfb7d7b
Replace abstract methods in SearchApi #267
patrick-austin Mar 24, 2022
b44091d
Merge branch '267_generic_SearchApi' into 267_search_endpoint
patrick-austin Mar 24, 2022
07a42a3
Add endpoint allowing fields and searchAfter #267
patrick-austin Mar 26, 2022
64d1af5
Add asserts to Lucene modify test #267
patrick-austin Apr 4, 2022
ba32387
Integration tests and fixes #267
patrick-austin Apr 6, 2022
3c0d3bc
Text fields and related entities #267
patrick-austin Apr 8, 2022
7efdc36
Enable Lucene facets #267
patrick-austin Apr 13, 2022
5e93b3d
Basic unit conversion tests #267
patrick-austin Apr 14, 2022
7eeb840
Add support for Opensearch requests #267
patrick-austin Apr 16, 2022
9450c4b
Integration test fixes and text analysis #267
patrick-austin Apr 28, 2022
901e3ad
Refactor SearchApi functions for clarity #267
patrick-austin Apr 28, 2022
3aa8d3a
SearchApi and unit conversion refactors #267
patrick-austin Apr 30, 2022
eb26a87
Add fields needed for DGS component #267
patrick-austin Jun 8, 2022
b228aa3
Update Search tests with Instruments #267
patrick-austin Jun 10, 2022
87ee1a3
Sparse string faceting fix #267
patrick-austin Jun 15, 2022
50d2dab
Filters and aborted search support #267
patrick-austin Jun 16, 2022
cbf2cf4
Return correct Json for aborted searches #267
patrick-austin Jun 17, 2022
5af7b18
Enable mutlivalue string facets #267
patrick-austin Jun 16, 2022
134303c
Refactors and Javadoc comments #267
patrick-austin Jun 21, 2022
b257f81
Support searches on sample name #267
patrick-austin Jul 13, 2022
c7d2a7a
SampleParameter, fileCount, value in range #267
patrick-austin Jul 22, 2022
c141e42
Add utility to population and timed aggregation #267
patrick-austin Aug 2, 2022
42bcef0
Improve timed file aggregation #267
patrick-austin Jul 24, 2022
e7a322c
Improved timeout and search syntax errors #267
patrick-austin Aug 5, 2022
1f59002
search.maxSearchTimeSeconds added to run.properties #267
patrick-austin Aug 5, 2022
fbc9b2a
Range check for lock #267
patrick-austin Aug 9, 2022
1c85317
Merge branch 'master' into 267_opensearch_support
patrick-austin Sep 5, 2022
37668c5
Add support for faceting DatasetTechnique #267
patrick-austin Sep 7, 2022
de8fd2d
Add deprecation warnings #267
patrick-austin Sep 9, 2022
60f30a6
Refactors in EntityBeanManager and SearchManager #267
patrick-austin Oct 3, 2022
6c9477a
Expand TestRS to cover getReadableIds change #277
patrick-austin Oct 5, 2022
de47467
Review changes, Opensearch refactors and fixes #267
patrick-austin Oct 12, 2022
e91214f
Update OpensearchQuery docstrings after refactor #267
patrick-austin Oct 21, 2022
03213aa
Apply suggestions from code review
patrick-austin Oct 24, 2022
906ba63
InvestigationFacilityCycle search support
patrick-austin Jan 23, 2023
8abef9a
Remove placeholder values for size and count #267
patrick-austin Jul 3, 2023
9278982
Merge branch '277_filterReadAccess_improvement' into 267_opensearch_s…
patrick-austin Sep 7, 2023
4fb3ac0
Merge branch 'master' into 267_opensearch_support
patrick-austin Sep 7, 2023
2589b52
6.1.0 release notes and icatadmin update
patrick-austin Sep 8, 2023
bfcb7b0
Index id as long instead of String #267
patrick-austin Sep 26, 2023
f8a977b
Special handling for InvestigationInstrument facets #267
patrick-austin Sep 28, 2023
3dac87b
Revert accidental deletion of Facility setters #267
patrick-austin Oct 5, 2023
efd1261
Add investigation null check in Sample.getDoc #267
patrick-austin Oct 6, 2023
f2e59e3
Tests for Investigation Sample filtering #267
patrick-austin Oct 10, 2023
aff1fc9
Return searchAfter when searching as root
patrick-austin Mar 14, 2024
a9b210d
Merge branch 'master' into 267_opensearch_support
patrick-austin Mar 15, 2024
b1a2514
Merge branch 'master' into 267_opensearch_support
patrick-austin Mar 19, 2024
d4bd8f9
Account for changes to IcatUnits
patrick-austin Mar 22, 2024
200cb3a
Merge branch 'master' into 267_opensearch_support
ajkyffin Oct 30, 2024
519c05d
Use icat-6.1 branch of icat-ansible in CI
ajkyffin Oct 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

<groupId>org.icatproject</groupId>
<artifactId>icat.server</artifactId>
<version>5.0.1-SNAPSHOT</version>
<version>5.1.0-SNAPSHOT</version>
<packaging>war</packaging>
<name>ICAT Server</name>
<description>A metadata catalogue to support Large Facility experimental data,
Expand Down Expand Up @@ -113,13 +113,13 @@
<dependency>
<groupId>org.icatproject</groupId>
<artifactId>icat.utils</artifactId>
<version>4.16.1</version>
<version>4.17.0-SNAPSHOT</version>
</dependency>

<dependency>
<groupId>org.icatproject</groupId>
<artifactId>icat.client</artifactId>
<version>5.0.0</version>
<version>5.1.0-SNAPSHOT</version>
<scope>test</scope>
</dependency>

Expand Down Expand Up @@ -226,7 +226,8 @@
</excludes>
<systemPropertyVariables>
<javax.net.ssl.trustStore>${javax.net.ssl.trustStore}</javax.net.ssl.trustStore>
<luceneUrl>${luceneUrl}</luceneUrl>
<searchEngine>${searchEngine}</searchEngine>
<searchUrls>${searchUrls}</searchUrls>
</systemPropertyVariables>
<testFailureIgnore>false</testFailureIgnore>
</configuration>
Expand All @@ -244,7 +245,8 @@
<systemPropertyVariables>
<javax.net.ssl.trustStore>${javax.net.ssl.trustStore}</javax.net.ssl.trustStore>
<serverUrl>${serverUrl}</serverUrl>
<luceneUrl>${luceneUrl}</luceneUrl>
<searchEngine>${searchEngine}</searchEngine>
<searchUrls>${searchUrls}</searchUrls>
</systemPropertyVariables>
</configuration>
<executions>
Expand Down Expand Up @@ -324,7 +326,8 @@
<argument>src/test/scripts/prepare_test.py</argument>
<argument>${containerHome}</argument>
<argument>${serverUrl}</argument>
<argument>${luceneUrl}</argument>
<argument>${searchEngine}</argument>
<argument>${searchUrls}</argument>
</arguments>
</configuration>
<goals>
Expand Down Expand Up @@ -402,6 +405,3 @@
</reporting>

</project>



18 changes: 10 additions & 8 deletions src/main/config/run.properties.example
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,16 @@ notification.Datafile = CU
# Call logging setup
log.list = SESSION WRITE READ INFO

# Lucene
lucene.url = https://localhost:8181
lucene.populateBlockSize = 10000
lucene.directory = ${HOME}/data/icat/lucene
lucene.backlogHandlerIntervalSeconds = 60
lucene.enqueuedRequestIntervalSeconds = 5
# The entities to index with Lucene. For example, remove 'Datafile' and 'DatafileParameter' if the number of datafiles exceeds lucene's limit of 2^32 entries in an index
!lucene.entitiesToIndex = Datafile Dataset Investigation InvestigationUser DatafileParameter DatasetParameter InvestigationParameter Sample
# Search Engine
# LUCENE, OPENSEARCH and ELASTICSEARCH engines are supported, however the latter two are considered experimental
search.engine = LUCENE
search.urls = https://localhost:8181
search.populateBlockSize = 10000
search.directory = ${HOME}/data/icat/search
search.backlogHandlerIntervalSeconds = 60
search.enqueuedRequestIntervalSeconds = 5
# The entities to index with the search engine. For example, remove 'Datafile' and 'DatafileParameter' if the number of datafiles exceeds lucene's limit of 2^32 entries in an index
!search.entitiesToIndex = Datafile Dataset Investigation InvestigationUser DatafileParameter DatasetParameter InvestigationParameter Sample

# List members of cluster
!cluster = http://vm200.nubes.stfc.ac.uk:8080 https://smfisher:8181
Expand Down
115 changes: 104 additions & 11 deletions src/main/java/org/icatproject/core/entity/Datafile.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.json.stream.JsonGenerator;
import javax.persistence.CascadeType;
Expand All @@ -20,7 +22,10 @@
import javax.persistence.UniqueConstraint;
import javax.xml.bind.annotation.XmlRootElement;

import org.icatproject.core.manager.LuceneApi;
import org.icatproject.core.IcatException;
import org.icatproject.core.manager.EntityInfoHandler;
import org.icatproject.core.manager.EntityInfoHandler.Relationship;
import org.icatproject.core.manager.search.SearchApi;

@Comment("A data file")
@SuppressWarnings("serial")
Expand Down Expand Up @@ -77,6 +82,8 @@ public class Datafile extends EntityBaseBean implements Serializable {
@OneToMany(cascade = CascadeType.ALL, mappedBy = "sourceDatafile")
private List<RelatedDatafile> sourceDatafiles = new ArrayList<RelatedDatafile>();

private static final Map<String, Relationship[]> documentFields = new HashMap<>();

/* Needed for JPA */
public Datafile() {
}
Expand Down Expand Up @@ -195,25 +202,111 @@ public void setSourceDatafiles(List<RelatedDatafile> sourceDatafiles) {

@Override
public void getDoc(JsonGenerator gen) {
StringBuilder sb = new StringBuilder(name);
SearchApi.encodeString(gen, "name", name);
if (description != null) {
sb.append(" " + description);
SearchApi.encodeString(gen, "description", description);
}
if (location != null) {
SearchApi.encodeString(gen, "location", location);
}
if (doi != null) {
sb.append(" " + doi);
SearchApi.encodeString(gen, "doi", doi);
}
if (fileSize != null) {
SearchApi.encodeLong(gen, "fileSize", fileSize);
} else {
SearchApi.encodeLong(gen, "fileSize", 0L);
}
SearchApi.encodeLong(gen, "fileCount", 1L); // Always 1, but makes sorting on fields consistent
if (datafileFormat != null) {
sb.append(" " + datafileFormat.getName());
datafileFormat.getDoc(gen);
}
LuceneApi.encodeTextfield(gen, "text", sb.toString());
if (datafileModTime != null) {
LuceneApi.encodeStringField(gen, "date", datafileModTime);
SearchApi.encodeLong(gen, "date", datafileModTime);
} else if (datafileCreateTime != null) {
LuceneApi.encodeStringField(gen, "date", datafileCreateTime);
SearchApi.encodeLong(gen, "date", datafileCreateTime);
} else {
LuceneApi.encodeStringField(gen, "date", modTime);
SearchApi.encodeLong(gen, "date", modTime);
}
SearchApi.encodeString(gen, "id", id);
if (dataset != null) {
SearchApi.encodeString(gen, "dataset.id", dataset.id);
SearchApi.encodeString(gen, "dataset.name", dataset.getName());
Sample sample = dataset.getSample();
if (sample != null) {
sample.getDoc(gen);
}
Investigation investigation = dataset.getInvestigation();
if (investigation != null) {
SearchApi.encodeString(gen, "investigation.id", investigation.id);
SearchApi.encodeString(gen, "investigation.name", investigation.getName());
SearchApi.encodeString(gen, "visitId", investigation.getVisitId());
VKTB marked this conversation as resolved.
Show resolved Hide resolved
if (investigation.getStartDate() != null) {
SearchApi.encodeLong(gen, "investigation.startDate", investigation.getStartDate());
} else if (investigation.getCreateTime() != null) {
SearchApi.encodeLong(gen, "investigation.startDate", investigation.getCreateTime());
}
}
}
LuceneApi.encodeStoredId(gen, id);
LuceneApi.encodeStringField(gen, "dataset", dataset.id);
}

/**
* Gets the fields used in the search component for this entity, and the
* relationships that would restrict the content of those fields.
*
* @return Map of field names (as they appear on the search document) against
* the Relationships that need to be allowed for that field to be
* viewable. If there are no restrictive relationships, then the value
* will be null.
* @throws IcatException If the EntityInfoHandler cannot find one of the
* Relationships.
*/
public static Map<String, Relationship[]> getDocumentFields() throws IcatException {
if (documentFields.size() == 0) {
EntityInfoHandler eiHandler = EntityInfoHandler.getInstance();
Relationship[] datafileFormatRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("datafileFormat") };
Relationship[] datasetRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("dataset") };
Relationship[] investigationRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("dataset"),
eiHandler.getRelationshipsByName(Dataset.class).get("investigation") };
Relationship[] instrumentRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("dataset"),
eiHandler.getRelationshipsByName(Dataset.class).get("investigation"),
eiHandler.getRelationshipsByName(Investigation.class).get("investigationInstruments"),
eiHandler.getRelationshipsByName(InvestigationInstrument.class).get("instrument") };
Relationship[] sampleRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("dataset"),
eiHandler.getRelationshipsByName(Dataset.class).get("sample"),
eiHandler.getRelationshipsByName(Sample.class).get("type") };
Relationship[] sampleTypeRelationships = {
eiHandler.getRelationshipsByName(Datafile.class).get("dataset"),
eiHandler.getRelationshipsByName(Dataset.class).get("sample") };
documentFields.put("name", null);
documentFields.put("description", null);
documentFields.put("location", null);
documentFields.put("doi", null);
documentFields.put("date", null);
documentFields.put("fileSize", null);
documentFields.put("fileCount", null);
documentFields.put("id", null);
documentFields.put("dataset.id", null);
documentFields.put("dataset.name", datasetRelationships);
documentFields.put("sample.id", datasetRelationships);
documentFields.put("sample.name", sampleRelationships);
documentFields.put("sample.investigation.id", sampleRelationships);
documentFields.put("sample.type.id", sampleRelationships);
documentFields.put("sample.type.name", sampleTypeRelationships);
documentFields.put("investigation.id", datasetRelationships);
documentFields.put("investigation.name", investigationRelationships);
documentFields.put("investigation.startDate", investigationRelationships);
documentFields.put("visitId", investigationRelationships);
documentFields.put("datafileFormat.id", null);
documentFields.put("datafileFormat.name", datafileFormatRelationships);
documentFields.put("InvestigationInstrument instrument.id", instrumentRelationships);
}
return documentFields;
}

}
14 changes: 14 additions & 0 deletions src/main/java/org/icatproject/core/entity/DatafileFormat.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import javax.json.stream.JsonGenerator;
import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
Expand All @@ -14,6 +18,8 @@
import javax.persistence.Table;
import javax.persistence.UniqueConstraint;

import org.icatproject.core.manager.search.SearchApi;

@Comment("A data file format")
@SuppressWarnings("serial")
@Entity
Expand Down Expand Up @@ -51,6 +57,8 @@ public void setFacility(Facility facility) {
@Column(name = "VERSION", nullable = false)
private String version;

public static Set<String> docFields = new HashSet<>(Arrays.asList("datafileFormat.name", "datafileFormat.id"));

/* Needed for JPA */
public DatafileFormat() {
}
Expand Down Expand Up @@ -95,4 +103,10 @@ public void setVersion(String version) {
this.version = version;
}

@Override
public void getDoc(JsonGenerator gen) {
SearchApi.encodeString(gen, "datafileFormat.name", name);
SearchApi.encodeString(gen, "datafileFormat.id", id);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@

import org.icatproject.core.IcatException;
import org.icatproject.core.manager.EntityBeanManager.PersistMode;
import org.icatproject.core.manager.search.SearchApi;
import org.icatproject.core.manager.GateKeeper;
import org.icatproject.core.manager.LuceneApi;

@Comment("A parameter associated with a data file")
@SuppressWarnings("serial")
Expand Down Expand Up @@ -56,7 +56,7 @@ public void setDatafile(Datafile datafile) {
@Override
public void getDoc(JsonGenerator gen) {
super.getDoc(gen);
LuceneApi.encodeSortedDocValuesField(gen, "datafile", datafile.id);
SearchApi.encodeString(gen, "datafile.id", datafile.id);
}

}
Loading