wiki:Other topics

Version 5 (modified by 231035, 2 days ago) ( diff )

--

Other Topics

SQL Performance, adding Indexes

To analyze the performance of our queries, we consider several different use cases.

The testing approach is as follows:

We first insert a large number of new records into the tables (around 1~2 million rows) so that using an index becomes meaningful and beneficial when applied correctly. It is clear that for a table with only a small number of records, indexing would not provide a significant advantage.

Before creating the index, we execute the queries 10 times using EXPLAIN ANALYZE. We then calculate the average Execution Time and record the query plan, so we can later compare it with the new plan after the index is introduced.

After adding the index, we run the same query again 10 times and compare the results in terms of execution time and query plan.

Scenario 1

Making appointments table have 1milion rows

ALTER TABLE appointments DISABLE TRIGGER trg_appointments_no_overlap;

INSERT INTO appointments (clinic_id, animal_id, responsible_owner_id, status, date_time, notes)
SELECT
    c.clinic_id,
    a.animal_id,
    a.owner_id,
    CASE
        WHEN dt < now() THEN
            (ARRAY['DONE','CANCELLED'])[floor(random()*2)+1]
        ELSE
            'CONFIRMED'
        END,
    dt,
    'Auto generated'
FROM generate_series(1, 1000000) gs
         JOIN LATERAL (
    SELECT clinic_id FROM vet_clinics ORDER BY random() LIMIT 1
    ) c ON true
         JOIN LATERAL (
    SELECT animal_id, owner_id FROM animals ORDER BY random() LIMIT 1
    ) a ON true
         JOIN LATERAL (
    SELECT NOW() - (random() * INTERVAL '90 days') AS dt
    ) t ON true;

ALTER TABLE appointments ENABLE TRIGGER trg_appointments_no_overlap;
  • I disabled the overlap trigger for easier testing, after I have inserted all the rows I enabled it again

Executing the query 10 times to see average execution time

EXPLAIN ANALYZE
SELECT *
FROM appointments
WHERE status = 'CONFIRMED'
  AND date_time < now() - interval '45 minutes';

After the first execution

Gather  (cost=1000.00..21156.46 rows=1 width=62) (actual time=0.646..81.896 rows=1 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on appointments  (cost=0.00..20156.36 rows=1 width=62) (actual time=44.874..69.423 rows=0 loops=3)
        Filter: (((status)::text = 'CONFIRMED'::text) AND (date_time < (now() - '00:45:00'::interval)))
        Rows Removed by Filter: 333334
Planning Time: 0.726 ms
Execution Time: 81.930 ms
  • Average execution time WITHOUT indexes

61.1925ms

Index Scan using idx_appointments_status_date_time on appointments  (cost=0.43..7.88 rows=1 width=62) (actual time=0.089..0.090 rows=1 loops=1)
  Index Cond: (((status)::text = 'CONFIRMED'::text) AND (date_time < (now() - '00:45:00'::interval)))
Planning Time: 1.020 ms
Execution Time: 0.122 ms
  • Average execution time WITH indexes

0.093ms Because the execution time has been optimized with the use of index we will keep the index.

Scenario 2

Making listings table have 2milion rows

INSERT INTO listings (animal_id, owner_id, status, price, created_at)
SELECT
    a.animal_id,
    a.owner_id,
    CASE
        WHEN random() < 0.7 THEN 'DRAFT'
        WHEN random() < 0.9 THEN 'ARCHIVED'
        ELSE 'SOLD'
        END,
    (random() * 1000 + 50)::numeric(10,2),
    NOW() - (random() * INTERVAL '120 days')
FROM generate_series(1, 2000000) gs
         JOIN LATERAL (
    SELECT animal_id, owner_id
    FROM animals
    ORDER BY random()
    LIMIT 1
    ) a ON true;

Executing the query 10 times to see average execution time

EXPLAIN ANALYZE
SELECT *
FROM listings
WHERE status = 'DRAFT'
  AND created_at < now() - interval '30 days';

After the first execution

Seq Scan on listings  (cost=0.00..60619.18 rows=1058297 width=576) (actual time=0.014..453.562 rows=1051498 loops=1)
  Filter: (((status)::text = 'DRAFT'::text) AND (created_at < (now() - '30 days'::interval)))
  Rows Removed by Filter: 948511
Planning Time: 0.544 ms
Execution Time: 487.292 ms
  • Average execution time WITHOUT indexes

494.8984ms

Seq Scan on listings  (cost=0.00..60619.18 rows=1058332 width=576) (actual time=0.044..473.306 rows=1051534 loops=1)
  Filter: (((status)::text = 'DRAFT'::text) AND (created_at < (now() - '30 days'::interval)))
  Rows Removed by Filter: 948475
Planning Time: 0.092 ms
Execution Time: 507.535 ms

  • Average execution time WITH indexes

500.327ms There isn't a change in the execution time, the query doesn't even use the index because its's cheaper to just use seq scan, we won't be keeping this index.

Security measures

SQL injection prevention

@Query(value = """
        SELECT *
        FROM get_top_active_users(:startTs, :endTs)
        """, nativeQuery = true)
    List<UserActivityRankingProjection> getTopActiveUsers(
            @Param("startTs") LocalDateTime startTs,
            @Param("endTs") LocalDateTime endTs
    );
  • Native queries are executed using named parameters such as :startTs and :endTs, which are bound by Spring Data JPA. This prevents SQL injection because input values are treated as parameters, not as executable SQL.

Keeping passwords

*Password security is implemented using BCryptPasswordEncoder, which hashes user passwords before storing them in the database. This ensures that passwords are not stored in plain text and provides protection against brute-force and dictionary attacks.

    @Bean
    public PasswordEncoder passwordEncoder() {
        return new BCryptPasswordEncoder();
    }

CORS

  • The application uses a CORS configuration to control which frontend clients are allowed to communicate with the backend API. In development, requests are allowed only from local frontend origins such as http://localhost:*(this will be changed to the correct URL of the hosted site). The configuration defines the permitted HTTP methods, including GET, POST, PUT, PATCH, DELETE, and OPTIONS.
    @Bean
        public CorsConfigurationSource corsConfigurationSource() {
            CorsConfiguration configuration = new CorsConfiguration();
            configuration.setAllowedOriginPatterns(List.of("http://localhost:*"));
            configuration.setAllowedMethods(Arrays.asList("GET", "POST", "PUT","PATCH", "DELETE", "OPTIONS"));
            configuration.setAllowedHeaders(Arrays.asList("*"));
            configuration.setAllowCredentials(true);
    
            UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
            source.registerCorsConfiguration("/**", configuration);
            return source;
        }
    

Other developments

Note: See TracWiki for help on using the wiki.